Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
On 06/01/2018 12:58 PM, Peter Xu wrote: On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: This is the deivce part implementation to add a new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device receives the guest free page hints from the driver and clears the corresponding bits in the dirty bitmap, so that those free pages are not transferred by the migration thread to the destination. - Test Environment Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Guest: 8G RAM, 4 vCPU Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second - Test Results - Idle Guest Live Migration Time (results are averaged over 10 runs): - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction - Guest with Linux Compilation Workload (make bzImage -j4): - Live Migration Time (average) Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction - Linux Compilation Time Optimization v.s. Legacy = 4min56s v.s. 5min3s --> no obvious difference - Source Code - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git Hi, Wei, I have a very high-level question to the series. Hi Peter, Thanks for joining the discussion :) IIUC the core idea for this series is that we can avoid sending some of the pages if we know that we don't need to send them. I think this is based on the fact that on the destination side all the pages are by default zero after they are malloced. While before this series, IIUC any migration will send every single page to destination, no matter whether it's zeroed or not. So I'm uncertain about whether this will affect the received bitmap on the destination side. Say, before this series, the received bitmap will directly cover the whole RAM bitmap after migration is finished, now it's won't. Will there be any side effect? I don't see obvious issue now, but just raise this question up. This feature currently only supports pre-copy (I think the received bitmap is something matters to post copy only). That's why we have rs->free_page_support = ..&& !migrate_postcopy(); Meanwhile, this reminds me about a more funny idea: whether we can just avoid sending the zero pages directly from QEMU's perspective. In other words, can we just do nothing if save_zero_page() detected that the page is zero (I guess the is_zero_range() can be fast too, but I don't know exactly how fast it is)? And how that would be differed from this page hinting way in either performance and other aspects. I guess you referred to the zero page optimization. I think the major overhead comes to the zero page checking - lots of memory accesses, which also waste memory bandwidth. Please see the results attached in the cover letter. The legacy case already includes the zero page optimization. Best, Wei
Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
On 06/01/2018 01:07 PM, Peter Xu wrote: On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote: On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: This is the deivce part implementation to add a new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device receives the guest free page hints from the driver and clears the corresponding bits in the dirty bitmap, so that those free pages are not transferred by the migration thread to the destination. - Test Environment Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Guest: 8G RAM, 4 vCPU Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second - Test Results - Idle Guest Live Migration Time (results are averaged over 10 runs): - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction - Guest with Linux Compilation Workload (make bzImage -j4): - Live Migration Time (average) Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction - Linux Compilation Time Optimization v.s. Legacy = 4min56s v.s. 5min3s --> no obvious difference - Source Code - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git Hi, Wei, I have a very high-level question to the series. IIUC the core idea for this series is that we can avoid sending some of the pages if we know that we don't need to send them. I think this is based on the fact that on the destination side all the pages are by default zero after they are malloced. While before this series, IIUC any migration will send every single page to destination, no matter whether it's zeroed or not. So I'm uncertain about whether this will affect the received bitmap on the destination side. Say, before this series, the received bitmap will directly cover the whole RAM bitmap after migration is finished, now it's won't. Will there be any side effect? I don't see obvious issue now, but just raise this question up. Meanwhile, this reminds me about a more funny idea: whether we can just avoid sending the zero pages directly from QEMU's perspective. In other words, can we just do nothing if save_zero_page() detected that the page is zero (I guess the is_zero_range() can be fast too, but I don't know exactly how fast it is)? And how that would be differed from this page hinting way in either performance and other aspects. I noticed a problem (after I wrote the above paragraph 5 minutes ago...): when a page was valid and sent to the destination (with non-zero data), however after a while that page was zeroed. Then if we don't send zero pages at all, we won't send the page after it's zeroed. Then on the destination side we'll have a stale non-zero page. Is my understanding correct? Will that be a problem to this series too where a valid page can be possibly freed and hinted? I think that won't be an issue either for zero page optimization or this free page optimization. For the zero page optimization, QEMU always sends compressed 0s to the destination. The zero page is detected at the time QEMU checks it (before sending the page). if it is a 0 page, QEMU compresses all 0s (actually just a flag) and send it. For the free page optimization, we skip free pages (could be thought of as 0 pages in this context). The zero pages are detected at the time guest reports it QEMU. The page won't be reported if it is non-zero (i.e. used). Best, Wei
[Qemu-devel] [PATCH 00/33] linux-user: Begin splitting do_syscall
This function is, as I think everyone will agree, way too large. This is about a third of the complete change, but I thought I'd get some feedback on the method and form before I go any farther. r~ Richard Henderson (33): linux-user: Split out do_syscall1 linux-user: Relax single exit from "break" linux-user: Propagate goto ebadf to return linux-user: Propagate goto efault to return linux-user: Propagate goto unimplemented_nowarn to return linux-user: Split out goto unimplemented to do_unimplemented linux-user: Propagate goto fail to return linux-user: Make syscall number unsigned linux-user: Set up infrastructure for table-izing syscalls linux-user: Split out brk, close, exit, read, write linux-user: Split out execve linux-user: Split out open, openat linux-user: Split out name_to_handle_at linux-user: Split out open_to_handle_at linux-user: Split out creat, fork, waitid, waitpid linux-user: Split out link, linkat linux-user: Split out unlink, unlinkat linux-user: Split out chdir, mknod, mknodat, time, chmod linux-user: Remove all unimplemented entries linux-user: Split out getpid, getxpid, lseek linux-user: Split out mount, umount linux-user: Split out alarm, pause, stime, utime, utimes linux-user: Split out access, faccessat, futimesat, kill, nice, sync, syncfs linux-user: Split out rename, renameat, renameat2 linux-user: Split out dup, mkdir, mkdirat, rmdir linux-user: Split out acct, pipe, pipe2, times, umount2 linux-user: Split out ioctl linux-user: Split out chroot, dup2, dup3, fcntl, setpgid, umask linux-user: Split out getpgrp, getppid, setsid linux-user: Split out rt_sigaction, sigaction linux-user: Split out rt_sigprocmask, sgetmask, sigprocmask, ssetmask linux-user: Split out rt_sigpending, rt_sigsuspend, sigpending, sigsuspend linux-user: Split out rt_sigqueueinfo, rt_sigtimedwait, rt_tgsigqueueinfo linux-user/qemu.h|2 +- linux-user/syscall.c | 4651 ++ 2 files changed, 2394 insertions(+), 2259 deletions(-) -- 2.17.0
[Qemu-devel] [PATCH 05/33] linux-user: Propagate goto unimplemented_nowarn to return
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 8ea2099001..f7b7051c1c 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -12081,7 +12081,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return 0; } #else - goto unimplemented_nowarn; + return -TARGET_ENOSYS; #endif #endif #ifdef TARGET_NR_get_thread_area @@ -12094,12 +12094,12 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return ts->tp_value; } #else -goto unimplemented_nowarn; +return -TARGET_ENOSYS; #endif #endif #ifdef TARGET_NR_getdomainname case TARGET_NR_getdomainname: -goto unimplemented_nowarn; +return -TARGET_ENOSYS; #endif #ifdef TARGET_NR_clock_settime @@ -12184,7 +12184,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, * holding a mutex that is shared with another process via * shared memory). */ -goto unimplemented_nowarn; +return -TARGET_ENOSYS; #endif #if defined(TARGET_NR_utimensat) @@ -12886,9 +12886,6 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, default: unimplemented: gemu_log("qemu: Unsupported syscall: %d\n", num); -#if defined(TARGET_NR_setxattr) || defined(TARGET_NR_get_thread_area) || defined(TARGET_NR_getdomainname) || defined(TARGET_NR_set_robust_list) -unimplemented_nowarn: -#endif return -TARGET_ENOSYS; } fail: -- 2.17.0
[Qemu-devel] [PATCH 03/33] linux-user: Propagate goto ebadf to return
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 187 +-- 1 file changed, 92 insertions(+), 95 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 258aff0411..d0bf650c62 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8025,7 +8025,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return 0; } else { if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0))) goto efault; @@ -8039,7 +8039,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return ret; case TARGET_NR_write: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } if (!(p = lock_user(VERIFY_READ, arg2, arg3, 1))) goto efault; @@ -8070,7 +8070,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif case TARGET_NR_openat: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } if (!(p = lock_user_string(arg2))) goto efault; @@ -8083,7 +8083,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) case TARGET_NR_name_to_handle_at: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } ret = do_name_to_handle_at(arg1, arg2, arg3, arg4, arg5); return ret; @@ -8091,7 +8091,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) case TARGET_NR_open_by_handle_at: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } ret = do_open_by_handle_at(arg1, arg2, arg3); fd_trans_unregister(ret); @@ -8099,7 +8099,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif case TARGET_NR_close: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } fd_trans_unregister(arg1); return get_errno(close(arg1)); @@ -8163,7 +8163,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_linkat) case TARGET_NR_linkat: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } else { void * p2 = NULL; if (!arg2 || !arg4) @@ -8190,7 +8190,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_unlinkat) case TARGET_NR_unlinkat: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } if (!(p = lock_user_string(arg2))) goto efault; @@ -8324,7 +8324,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_mknodat) case TARGET_NR_mknodat: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } if (!(p = lock_user_string(arg2))) goto efault; @@ -8350,7 +8350,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif case TARGET_NR_lseek: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } return get_errno(lseek(arg1, arg2, arg3)); #if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA) @@ -8497,7 +8497,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_futimesat) case TARGET_NR_futimesat: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } else { struct timeval *tvp, tv[2]; if (arg3) { @@ -8543,7 +8543,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_faccessat) && defined(__NR_faccessat) case TARGET_NR_faccessat: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } if (!(fn = lock_user_string(arg2))) { goto efault; @@ -8590,7 +8590,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_renameat) case TARGET_NR_renameat: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } else { void *p2; p = lock_user_string(arg2); @@ -8607,7 +8607,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_renameat2) case TARGET_NR_renameat2: if (is_hostfd(arg1)) { -goto ebadf; +return -TARGET_EBADF; } else { void *p2;
[Qemu-devel] [PATCH 08/33] linux-user: Make syscall number unsigned
Signed-off-by: Richard Henderson --- linux-user/qemu.h| 2 +- linux-user/syscall.c | 20 ++-- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 05a82a3628..623a8d8b7a 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -231,7 +231,7 @@ abi_long memcpy_to_target(abi_ulong dest, const void *src, void target_set_brk(abi_ulong new_brk); abi_long do_brk(abi_ulong new_brk); void syscall_init(void); -abi_long do_syscall(void *cpu_env, int num, abi_long arg1, +abi_long do_syscall(void *cpu_env, unsigned num, abi_long arg1, abi_long arg2, abi_long arg3, abi_long arg4, abi_long arg5, abi_long arg6, abi_long arg7, abi_long arg8); diff --git a/linux-user/syscall.c b/linux-user/syscall.c index a413aad658..e2e2d58e84 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -719,20 +719,20 @@ static inline int next_free_host_timer(void) /* ARM EABI and MIPS expect 64bit types aligned even on pairs or registers */ #ifdef TARGET_ARM -static inline int regpairs_aligned(void *cpu_env, int num) +static inline int regpairs_aligned(void *cpu_env, unsigned num) { return CPUARMState *)cpu_env)->eabi) == 1) ; } #elif defined(TARGET_MIPS) && (TARGET_ABI_BITS == 32) -static inline int regpairs_aligned(void *cpu_env, int num) { return 1; } +static inline int regpairs_aligned(void *cpu_env, unsigned num) { return 1; } #elif defined(TARGET_PPC) && !defined(TARGET_PPC64) /* SysV AVI for PPC32 expects 64bit parameters to be passed on odd/even pairs * of registers which translates to the same as ARM/MIPS, because we start with * r3 as arg1 */ -static inline int regpairs_aligned(void *cpu_env, int num) { return 1; } +static inline int regpairs_aligned(void *cpu_env, unsigned num) { return 1; } #elif defined(TARGET_SH4) /* SH4 doesn't align register pairs, except for p{read,write}64 */ -static inline int regpairs_aligned(void *cpu_env, int num) +static inline int regpairs_aligned(void *cpu_env, unsigned num) { switch (num) { case TARGET_NR_pread64: @@ -744,9 +744,9 @@ static inline int regpairs_aligned(void *cpu_env, int num) } } #elif defined(TARGET_XTENSA) -static inline int regpairs_aligned(void *cpu_env, int num) { return 1; } +static inline int regpairs_aligned(void *cpu_env, unsigned num) { return 1; } #else -static inline int regpairs_aligned(void *cpu_env, int num) { return 0; } +static inline int regpairs_aligned(void *cpu_env, unsigned num) { return 0; } #endif #define ERRNO_TABLE_SIZE 1200 @@ -7962,9 +7962,9 @@ static int host_to_target_cpu_mask(const unsigned long *host_mask, return 0; } -static abi_long do_unimplemented(int num) +static abi_long do_unimplemented(unsigned num) { -gemu_log("qemu: Unsupported syscall: %d\n", num); +gemu_log("qemu: Unsupported syscall: %u\n", num); return -TARGET_ENOSYS; } @@ -7973,7 +7973,7 @@ static abi_long do_unimplemented(int num) * of syscall results, can be performed. * All errnos that do_syscall() returns must be -TARGET_. */ -static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, +static abi_long do_syscall1(void *cpu_env, unsigned num, abi_long arg1, abi_long arg2, abi_long arg3, abi_long arg4, abi_long arg5, abi_long arg6, abi_long arg7, abi_long arg8) @@ -12880,7 +12880,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return ret; } -abi_long do_syscall(void *cpu_env, int num, abi_long arg1, +abi_long do_syscall(void *cpu_env, unsigned num, abi_long arg1, abi_long arg2, abi_long arg3, abi_long arg4, abi_long arg5, abi_long arg6, abi_long arg7, abi_long arg8) -- 2.17.0
[Qemu-devel] [PATCH 11/33] linux-user: Split out execve
At the same time, fix the repeated re-reading of the argv and env arrays from guest memory. Instead read into a unified array once. Signed-off-by: Richard Henderson --- linux-user/syscall.c | 203 ++- 1 file changed, 106 insertions(+), 97 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index b0d268dab7..a9b59a8658 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7998,6 +7998,111 @@ IMPL(close) return get_errno(close(arg1)); } +IMPL(execve) +{ +abi_ulong *guest_ptrs; +char **host_ptrs; +int argc, envc, alloc, i; +abi_ulong gp; +abi_ulong guest_argp = arg2; +abi_ulong guest_envp = arg3; +char *filename; +abi_long ret; + +/* Initial estimate of number of guest pointers required. */ +alloc = 32; +guest_ptrs = g_new(abi_ulong, alloc); + +/* Iterate through argp and envp, counting entries, and + * reading guest addresses from the arrays. + */ +for (gp = guest_argp, argc = 0; gp; gp += sizeof(abi_ulong)) { +abi_ulong addr; +if (get_user_ual(addr, gp)) { +return -TARGET_EFAULT; +} +if (!addr) { +break; +} +if (argc >= alloc) { +alloc *= 2; +guest_ptrs = g_renew(abi_ulong, guest_ptrs, alloc); +} +guest_ptrs[argc++] = addr; +} +for (gp = guest_envp, envc = 0; gp; gp += sizeof(abi_ulong)) { +abi_ulong addr; +if (get_user_ual(addr, gp)) { +return -TARGET_EFAULT; +} +if (!addr) { +break; +} +if (argc + envc >= alloc) { +alloc *= 2; +guest_ptrs = g_renew(abi_ulong, guest_ptrs, alloc); +} +guest_ptrs[argc + envc++] = addr; +} + +/* Exact number of host pointers required. */ +host_ptrs = g_new0(char *, argc + envc + 2); + +/* Iterate through the argp and envp that we already read + * and convert the guest pointers to host pointers. + */ +ret = -TARGET_EFAULT; +for (i = 0; i < argc; ++i) { +char *p = lock_user_string(guest_ptrs[i]); +if (!p) { +goto fini; +} +host_ptrs[i] = p; +} +for (i = 0; i < envc; ++i) { +char *p = lock_user_string(guest_ptrs[argc + i]); +if (!p) { +goto fini; +} +host_ptrs[argc + 1 + i] = p; +} + +/* Read the executable filename. */ +filename = lock_user_string(arg1); +if (!filename) { +goto fini; +} + +/* Although execve() is not an interruptible syscall it is + * a special case where we must use the safe_syscall wrapper: + * if we allow a signal to happen before we make the host + * syscall then we will 'lose' it, because at the point of + * execve the process leaves QEMU's control. So we use the + * safe syscall wrapper to ensure that we either take the + * signal as a guest signal, or else it does not happen + * before the execve completes and makes it the other + * program's problem. + */ +ret = get_errno(safe_execve(filename, host_ptrs, host_ptrs + argc + 1)); +unlock_user(filename, arg1, 0); + + fini: +/* Deallocate everything we allocated above. */ +for (i = 0; i < argc; ++i) { +if (host_ptrs[i]) { +unlock_user(host_ptrs[i], guest_ptrs[i], 0); +} +} +for (i = 0; i < envc; ++i) { +if (host_ptrs[argc + 1 + i]) { +unlock_user(host_ptrs[argc + 1 + i], guest_ptrs[argc + i], 0); +} +} +g_free(host_ptrs); +g_free(guest_ptrs); +return ret; +} + IMPL(exit) { CPUState *cpu = ENV_GET_CPU(cpu_env); @@ -8237,103 +8342,6 @@ IMPL(everything_else) unlock_user(p, arg2, 0); return ret; #endif -case TARGET_NR_execve: -{ -char **argp, **envp; -int argc, envc; -abi_ulong gp; -abi_ulong guest_argp; -abi_ulong guest_envp; -abi_ulong addr; -char **q; -int total_size = 0; - -argc = 0; -guest_argp = arg2; -for (gp = guest_argp; gp; gp += sizeof(abi_ulong)) { -if (get_user_ual(addr, gp)) -return -TARGET_EFAULT; -if (!addr) -break; -argc++; -} -envc = 0; -guest_envp = arg3; -for (gp = guest_envp; gp; gp += sizeof(abi_ulong)) { -if (get_user_ual(addr, gp)) -return -TARGET_EFAULT; -if (!addr) -break; -envc++; -} - -argp = g_new0(char *, argc + 1); -envp = g_new0(char *, envc + 1); - -for (gp = guest_argp, q = argp; gp; - gp += sizeof(abi_ulong), q++) { -if (get_user_ual(addr, gp)) -
[Qemu-devel] [PATCH 01/33] linux-user: Split out do_syscall1
There was supposed to be a single point of return for do_syscall so that tracing works properly. However, there are a few bugs in that area. It is significantly simpler to simply split out an inner function to enforce this. Signed-off-by: Richard Henderson --- linux-user/syscall.c | 89 +++- 1 file changed, 54 insertions(+), 35 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index b75dd9a5bc..ebaefebcc2 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7962,13 +7962,15 @@ static int host_to_target_cpu_mask(const unsigned long *host_mask, return 0; } -/* do_syscall() should always have a single exit point at the end so - that actions, such as logging of syscall results, can be performed. - All errnos that do_syscall() returns must be -TARGET_. */ -abi_long do_syscall(void *cpu_env, int num, abi_long arg1, -abi_long arg2, abi_long arg3, abi_long arg4, -abi_long arg5, abi_long arg6, abi_long arg7, -abi_long arg8) +/* This is an internal helper for do_syscall so that it is easier + * to have a single return point, so that actions, such as logging + * of syscall results, can be performed. + * All errnos that do_syscall() returns must be -TARGET_. + */ +static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, +abi_long arg2, abi_long arg3, abi_long arg4, +abi_long arg5, abi_long arg6, abi_long arg7, +abi_long arg8) { CPUState *cpu = ENV_GET_CPU(cpu_env); abi_long ret; @@ -7977,28 +7979,6 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1, void *p; char *fn; -#if defined(DEBUG_ERESTARTSYS) -/* Debug-only code for exercising the syscall-restart code paths - * in the per-architecture cpu main loops: restart every syscall - * the guest makes once before letting it through. - */ -{ -static int flag; - -flag = !flag; -if (flag) { -return -TARGET_ERESTARTSYS; -} -} -#endif - -#ifdef DEBUG -gemu_log("syscall %d", num); -#endif -trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8); -if(do_strace) -print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6); - switch(num) { case TARGET_NR_exit: /* In old applications this may be used to implement _exit(2). @@ -13101,12 +13081,6 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1, break; } fail: -#ifdef DEBUG -gemu_log(" = " TARGET_ABI_FMT_ld "\n", ret); -#endif -if(do_strace) -print_syscall_ret(num, ret); -trace_guest_user_syscall_ret(cpu, num, ret); return ret; efault: ret = -TARGET_EFAULT; @@ -13115,3 +13089,48 @@ ebadf: ret = -TARGET_EBADF; goto fail; } + +abi_long do_syscall(void *cpu_env, int num, abi_long arg1, +abi_long arg2, abi_long arg3, abi_long arg4, +abi_long arg5, abi_long arg6, abi_long arg7, +abi_long arg8) +{ +CPUState *cpu = ENV_GET_CPU(cpu_env); +abi_long ret; + +#if defined(DEBUG_ERESTARTSYS) +/* Debug-only code for exercising the syscall-restart code paths + * in the per-architecture cpu main loops: restart every syscall + * the guest makes once before letting it through. + */ +{ +static bool flag; +flag = !flag; +if (flag) { +return -TARGET_ERESTARTSYS; +} +} +#endif +#ifdef DEBUG +gemu_log("syscall %d", num); +#endif + +trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4, + arg5, arg6, arg7, arg8); + +if (unlikely(do_strace)) { +print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6); +ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4, + arg5, arg6, arg7, arg8); +print_syscall_ret(num, ret); +} else { +ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4, + arg5, arg6, arg7, arg8); +} + +#ifdef DEBUG +gemu_log(" = " TARGET_ABI_FMT_ld "\n", ret); +#endif +trace_guest_user_syscall_ret(cpu, num, ret); +return ret; +} -- 2.17.0
[Qemu-devel] [PATCH 07/33] linux-user: Propagate goto fail to return
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 62 1 file changed, 23 insertions(+), 39 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 4269ec2c23..a413aad658 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -9001,8 +9001,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, how = SIG_SETMASK; break; default: -ret = -TARGET_EINVAL; -goto fail; +return -TARGET_EINVAL; } mask = arg2; target_to_host_old_sigset(&set, &mask); @@ -9029,8 +9028,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, how = SIG_SETMASK; break; default: -ret = -TARGET_EINVAL; -goto fail; +return -TARGET_EINVAL; } if (!(p = lock_user(VERIFY_READ, arg2, sizeof(target_sigset_t), 1))) return -TARGET_EFAULT; @@ -9073,8 +9071,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, how = SIG_SETMASK; break; default: -ret = -TARGET_EINVAL; -goto fail; +return -TARGET_EINVAL; } if (!(p = lock_user(VERIFY_READ, arg2, sizeof(target_sigset_t), 1))) return -TARGET_EFAULT; @@ -9363,15 +9360,15 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, ret = copy_from_user_fdset_ptr(&rfds, &rfds_ptr, rfd_addr, n); if (ret) { -goto fail; +return ret; } ret = copy_from_user_fdset_ptr(&wfds, &wfds_ptr, wfd_addr, n); if (ret) { -goto fail; +return ret; } ret = copy_from_user_fdset_ptr(&efds, &efds_ptr, efd_addr, n); if (ret) { -goto fail; +return ret; } if (contains_hostfd(&rfds) || contains_hostfd(&wfds) || @@ -9409,8 +9406,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, sig.set = &set; if (arg_sigsize != sizeof(*target_sigset)) { /* Like the kernel, we enforce correct size sigsets */ -ret = -TARGET_EINVAL; -goto fail; +return -TARGET_EINVAL; } target_sigset = lock_user(VERIFY_READ, arg_sigset, sizeof(*target_sigset), 1); @@ -9951,18 +9947,15 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, case TARGET_SYSLOG_ACTION_READ_CLEAR:/* Read/clear msgs */ case TARGET_SYSLOG_ACTION_READ_ALL: /* Read last messages */ { -ret = -TARGET_EINVAL; if (len < 0) { -goto fail; +return -TARGET_EINVAL; } -ret = 0; if (len == 0) { -return ret; +return 0; } p = lock_user(VERIFY_WRITE, arg2, arg3, 0); if (!p) { -ret = -TARGET_EFAULT; -goto fail; +return -TARGET_EFAULT; } ret = get_errno(sys_syslog((int)arg1, p, (int)arg3)); unlock_user(p, arg2, arg3); @@ -10363,8 +10356,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, dirp = g_try_malloc(count); if (!dirp) { -ret = -TARGET_ENOMEM; -goto fail; +return -TARGET_ENOMEM; } ret = get_errno(sys_getdents(arg1, dirp, count)); @@ -10556,7 +10548,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, if (ret < 0) { unlock_user(target_pfd, arg1, sizeof(struct target_pollfd) * nfds); -goto fail; +return ret; } } @@ -10788,7 +10780,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, arg2 ? &node : NULL, NULL)); if (is_error(ret)) { -goto fail; +return ret; } if (arg1 && put_user_u32(cpu, arg1)) { return -TARGET_EFAULT; @@ -11290,8 +11282,7 @@ static abi_long do_syscall1(void *c
[Qemu-devel] [PATCH 09/33] linux-user: Set up infrastructure for table-izing syscalls
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 42 ++ 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index e2e2d58e84..fc3dc3f40d 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7962,21 +7962,34 @@ static int host_to_target_cpu_mask(const unsigned long *host_mask, return 0; } +typedef abi_long impl_fn(void *cpu_env, unsigned num, abi_long arg1, + abi_long arg2, abi_long arg3, abi_long arg4, + abi_long arg5, abi_long arg6, abi_long arg7, + abi_long arg8); + static abi_long do_unimplemented(unsigned num) { gemu_log("qemu: Unsupported syscall: %u\n", num); return -TARGET_ENOSYS; } +#define IMPL(NAME) \ +static abi_long impl_##NAME(void *cpu_env, unsigned num, abi_long arg1, \ +abi_long arg2, abi_long arg3, abi_long arg4, \ +abi_long arg5, abi_long arg6, abi_long arg7, \ +abi_long arg8) + +IMPL(enosys) +{ +return do_unimplemented(num); +} + /* This is an internal helper for do_syscall so that it is easier * to have a single return point, so that actions, such as logging * of syscall results, can be performed. * All errnos that do_syscall() returns must be -TARGET_. */ -static abi_long do_syscall1(void *cpu_env, unsigned num, abi_long arg1, -abi_long arg2, abi_long arg3, abi_long arg4, -abi_long arg5, abi_long arg6, abi_long arg7, -abi_long arg8) +IMPL(everything_else) { CPUState *cpu = ENV_GET_CPU(cpu_env); abi_long ret; @@ -12880,6 +12893,10 @@ static abi_long do_syscall1(void *cpu_env, unsigned num, abi_long arg1, return ret; } +static impl_fn * const syscall_table[] = { +impl_everything_else, +}; + abi_long do_syscall(void *cpu_env, unsigned num, abi_long arg1, abi_long arg2, abi_long arg3, abi_long arg4, abi_long arg5, abi_long arg6, abi_long arg7, @@ -12908,14 +12925,23 @@ abi_long do_syscall(void *cpu_env, unsigned num, abi_long arg1, trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8); +/* ??? After impl_everything_else is fully split, initialize with NULL. */ +impl_fn *fn = impl_everything_else; +if (num < ARRAY_SIZE(syscall_table)) { +fn = syscall_table[num]; +} +if (fn == NULL) { +fn = impl_enosys; +} + if (unlikely(do_strace)) { print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6); -ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8); +ret = fn(cpu_env, num, arg1, arg2, arg3, arg4, + arg5, arg6, arg7, arg8); print_syscall_ret(num, ret); } else { -ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4, - arg5, arg6, arg7, arg8); +ret = fn(cpu_env, num, arg1, arg2, arg3, arg4, + arg5, arg6, arg7, arg8); } #ifdef DEBUG -- 2.17.0
[Qemu-devel] [PATCH 12/33] linux-user: Split out open, openat
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 65 1 file changed, 42 insertions(+), 23 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index a9b59a8658..fb1a8a4e7e 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8145,6 +8145,44 @@ IMPL(exit) g_assert_not_reached(); } +#ifdef TARGET_NR_open +IMPL(open) +{ +char *fn = lock_user_string(arg1); +abi_long ret; + +if (!fn) { +return -TARGET_EFAULT; +} +ret = get_errno(do_openat(cpu_env, AT_FDCWD, fn, + target_to_host_bitmask(arg2, fcntl_flags_tbl), + arg3)); +fd_trans_unregister(ret); +unlock_user(fn, arg1, 0); +return ret; +} +#endif + +IMPL(openat) +{ +char *fn; +abi_long ret; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +fn = lock_user_string(arg2); +if (!fn) { +return -TARGET_EFAULT; +} +ret = get_errno(do_openat(cpu_env, arg1, fn, + target_to_host_bitmask(arg3, fcntl_flags_tbl), + arg4)); +fd_trans_unregister(ret); +unlock_user(fn, arg2, 0); +return ret; +} + IMPL(read) { abi_long ret; @@ -8210,29 +8248,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_open -case TARGET_NR_open: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(do_openat(cpu_env, AT_FDCWD, p, - target_to_host_bitmask(arg2, fcntl_flags_tbl), - arg3)); -fd_trans_unregister(ret); -unlock_user(p, arg1, 0); -return ret; -#endif -case TARGET_NR_openat: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -if (!(p = lock_user_string(arg2))) -return -TARGET_EFAULT; -ret = get_errno(do_openat(cpu_env, arg1, p, - target_to_host_bitmask(arg3, fcntl_flags_tbl), - arg4)); -fd_trans_unregister(ret); -unlock_user(p, arg2, 0); -return ret; #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) case TARGET_NR_name_to_handle_at: if (is_hostfd(arg1)) { @@ -12926,6 +12941,10 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_close] = impl_close, [TARGET_NR_execve] = impl_execve, [TARGET_NR_exit] = impl_exit, +#ifdef TARGET_NR_open +[TARGET_NR_open] = impl_open, +#endif +[TARGET_NR_openat] = impl_openat, [TARGET_NR_read] = impl_read, [TARGET_NR_write] = impl_write, }; -- 2.17.0
[Qemu-devel] [PATCH 06/33] linux-user: Split out goto unimplemented to do_unimplemented
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 82 +++- 1 file changed, 43 insertions(+), 39 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index f7b7051c1c..4269ec2c23 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7962,6 +7962,12 @@ static int host_to_target_cpu_mask(const unsigned long *host_mask, return 0; } +static abi_long do_unimplemented(int num) +{ +gemu_log("qemu: Unsupported syscall: %d\n", num); +return -TARGET_ENOSYS; +} + /* This is an internal helper for do_syscall so that it is easier * to have a single return point, so that actions, such as logging * of syscall results, can be performed. @@ -8342,11 +8348,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif #ifdef TARGET_NR_break case TARGET_NR_break: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_oldstat case TARGET_NR_oldstat: -goto unimplemented; +return do_unimplemented(num); #endif case TARGET_NR_lseek: if (is_hostfd(arg1)) { @@ -8436,14 +8442,14 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, } #endif case TARGET_NR_ptrace: -goto unimplemented; +return do_unimplemented(num); #ifdef TARGET_NR_alarm /* not on alpha */ case TARGET_NR_alarm: return alarm(arg1); #endif #ifdef TARGET_NR_oldfstat case TARGET_NR_oldfstat: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_pause /* not on alpha */ case TARGET_NR_pause: @@ -8522,11 +8528,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif #ifdef TARGET_NR_stty case TARGET_NR_stty: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_gtty case TARGET_NR_gtty: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_access case TARGET_NR_access: @@ -8561,7 +8567,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif #ifdef TARGET_NR_ftime case TARGET_NR_ftime: -goto unimplemented; +return do_unimplemented(num); #endif case TARGET_NR_sync: sync(); @@ -8687,11 +8693,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return ret; #ifdef TARGET_NR_prof case TARGET_NR_prof: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_signal case TARGET_NR_signal: -goto unimplemented; +return do_unimplemented(num); #endif case TARGET_NR_acct: if (arg1 == 0) { @@ -8715,7 +8721,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif #ifdef TARGET_NR_lock case TARGET_NR_lock: -goto unimplemented; +return do_unimplemented(num); #endif case TARGET_NR_ioctl: return do_ioctl(arg1, arg2, arg3); @@ -8725,17 +8731,17 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif #ifdef TARGET_NR_mpx case TARGET_NR_mpx: -goto unimplemented; +return do_unimplemented(num); #endif case TARGET_NR_setpgid: return get_errno(setpgid(arg1, arg2)); #ifdef TARGET_NR_ulimit case TARGET_NR_ulimit: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_oldolduname case TARGET_NR_oldolduname: -goto unimplemented; +return do_unimplemented(num); #endif case TARGET_NR_umask: return get_errno(umask(arg1)); @@ -8747,7 +8753,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return ret; #ifdef TARGET_NR_ustat case TARGET_NR_ustat: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_dup2 case TARGET_NR_dup2: @@ -9471,7 +9477,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif #ifdef TARGET_NR_oldlstat case TARGET_NR_oldlstat: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_readlink case TARGET_NR_readlink: @@ -9536,7 +9542,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif #ifdef TARGET_NR_uselib case TARGET_NR_uselib: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_swapon case TARGET_NR_swapon: @@ -9561,7 +9567,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return ret; #ifdef TARGET_NR_readdir case TARGET_NR_readdir: -goto unimplemented; +return do_unimplemented(num); #endif #ifdef TARGET_NR_mmap case TARGET_NR_mmap: @@ -9699,7 +9705,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return get_errno(setpriority(arg1, arg2, arg3)); #ifdef TARGET_NR_pro
[Qemu-devel] [PATCH 10/33] linux-user: Split out brk, close, exit, read, write
These are relatively simple unconditionally defined syscalls. Signed-off-by: Richard Henderson --- linux-user/syscall.c | 198 --- 1 file changed, 111 insertions(+), 87 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index fc3dc3f40d..b0d268dab7 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7984,6 +7984,112 @@ IMPL(enosys) return do_unimplemented(num); } +IMPL(brk) +{ +return do_brk(arg1); +} + +IMPL(close) +{ +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +fd_trans_unregister(arg1); +return get_errno(close(arg1)); +} + +IMPL(exit) +{ +CPUState *cpu = ENV_GET_CPU(cpu_env); + +/* In old applications this may be used to implement _exit(2). + However in threaded applictions it is used for thread termination, + and _exit_group is used for application termination. + Do thread termination if we have more then one thread. */ +if (block_signals()) { +return -TARGET_ERESTARTSYS; +} + +cpu_list_lock(); + +if (CPU_NEXT(first_cpu)) { +/* Remove the CPU from the list. */ +QTAILQ_REMOVE(&cpus, cpu, node); +cpu_list_unlock(); + +TaskState *ts = cpu->opaque; +if (ts->child_tidptr) { +put_user_u32(0, ts->child_tidptr); +sys_futex(g2h(ts->child_tidptr), FUTEX_WAKE, INT_MAX, + NULL, NULL, 0); +} +thread_cpu = NULL; +object_unref(OBJECT(cpu)); +g_free(ts); +rcu_unregister_thread(); +pthread_exit(NULL); +} else { +cpu_list_unlock(); + +#ifdef TARGET_GPROF +_mcleanup(); +#endif +gdb_exit(cpu_env, arg1); +_exit(arg1); +} +g_assert_not_reached(); +} + +IMPL(read) +{ +abi_long ret; +char *fn; + +if (arg3 == 0) { +return 0; +} +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +fn = lock_user(VERIFY_WRITE, arg2, arg3, 0); +if (!fn) { +return -TARGET_EFAULT; +} +ret = get_errno(safe_read(arg1, fn, arg3)); +if (ret >= 0 && fd_trans_host_to_target_data(arg1)) { +ret = fd_trans_host_to_target_data(arg1)(fn, ret); +} +unlock_user(fn, arg2, ret); +return ret; +} + +IMPL(write) +{ +abi_long ret; +char *fn; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +fn = lock_user(VERIFY_READ, arg2, arg3, 1); +if (!fn) { +return -TARGET_EFAULT; +} +if (fd_trans_target_to_host_data(arg1)) { +void *copy = g_malloc(arg3); +memcpy(copy, fn, arg3); +ret = fd_trans_target_to_host_data(arg1)(copy, arg3); +if (ret >= 0) { +ret = get_errno(safe_write(arg1, copy, ret)); +} +g_free(copy); +} else { +ret = get_errno(safe_write(arg1, fn, arg3)); +} +unlock_user(fn, arg2, ret); +return ret; +} + /* This is an internal helper for do_syscall so that it is easier * to have a single return point, so that actions, such as logging * of syscall results, can be performed. @@ -7999,83 +8105,6 @@ IMPL(everything_else) char *fn; switch(num) { -case TARGET_NR_exit: -/* In old applications this may be used to implement _exit(2). - However in threaded applictions it is used for thread termination, - and _exit_group is used for application termination. - Do thread termination if we have more then one thread. */ - -if (block_signals()) { -return -TARGET_ERESTARTSYS; -} - -cpu_list_lock(); - -if (CPU_NEXT(first_cpu)) { -TaskState *ts; - -/* Remove the CPU from the list. */ -QTAILQ_REMOVE(&cpus, cpu, node); - -cpu_list_unlock(); - -ts = cpu->opaque; -if (ts->child_tidptr) { -put_user_u32(0, ts->child_tidptr); -sys_futex(g2h(ts->child_tidptr), FUTEX_WAKE, INT_MAX, - NULL, NULL, 0); -} -thread_cpu = NULL; -object_unref(OBJECT(cpu)); -g_free(ts); -rcu_unregister_thread(); -pthread_exit(NULL); -} - -cpu_list_unlock(); -#ifdef TARGET_GPROF -_mcleanup(); -#endif -gdb_exit(cpu_env, arg1); -_exit(arg1); -return 0; /* avoid warning */ -case TARGET_NR_read: -if (arg3 == 0) { -return 0; -} else { -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0))) -return -TARGET_EFAULT; -ret = get_errno(safe_read(arg1, p, arg3)); -if (ret >= 0 && -fd_trans_host_to_target_data(arg1)) { -ret = fd_trans_host_to_target_data(arg1)(p, ret); -} -unlock_user(p, arg2, ret); -
[Qemu-devel] [PATCH 13/33] linux-user: Split out name_to_handle_at
At the same time, merge do_name_to_handle_at into the new function. Signed-off-by: Richard Henderson --- linux-user/syscall.c | 129 +-- 1 file changed, 64 insertions(+), 65 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index fb1a8a4e7e..4afc22c20c 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7369,63 +7369,6 @@ static int do_futex(target_ulong uaddr, int op, int val, target_ulong timeout, return -TARGET_ENOSYS; } } -#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) -static abi_long do_name_to_handle_at(abi_long dirfd, abi_long pathname, - abi_long handle, abi_long mount_id, - abi_long flags) -{ -struct file_handle *target_fh; -struct file_handle *fh; -int mid = 0; -abi_long ret; -char *name; -unsigned int size, total_size; - -if (get_user_s32(size, handle)) { -return -TARGET_EFAULT; -} - -name = lock_user_string(pathname); -if (!name) { -return -TARGET_EFAULT; -} - -total_size = sizeof(struct file_handle) + size; -target_fh = lock_user(VERIFY_WRITE, handle, total_size, 0); -if (!target_fh) { -unlock_user(name, pathname, 0); -return -TARGET_EFAULT; -} - -fh = g_malloc0(total_size); -fh->handle_bytes = size; - -TRY_INTERP_FD(ret, name, - name_to_handle_at(interp_dirfd, name + 1, fh, &mid, flags), - name_to_handle_at(dirfd, name, fh, &mid, flags)); -ret = get_errno(ret); -unlock_user(name, pathname, 0); - -/* man name_to_handle_at(2): - * Other than the use of the handle_bytes field, the caller should treat - * the file_handle structure as an opaque data type - */ - -memcpy(target_fh, fh, total_size); -target_fh->handle_bytes = tswap32(fh->handle_bytes); -target_fh->handle_type = tswap32(fh->handle_type); -g_free(fh); -unlock_user(target_fh, handle, total_size); - -if (put_user_s32(mid, mount_id)) { -return -TARGET_EFAULT; -} - -return ret; - -} -#endif - #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) static abi_long do_open_by_handle_at(abi_long mount_fd, abi_long handle, abi_long flags) @@ -8145,6 +8088,67 @@ IMPL(exit) g_assert_not_reached(); } +#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) +IMPL(name_to_handle_at) +{ +abi_long dirfd = arg1; +abi_long pathname = arg2; +abi_long handle = arg3; +abi_long mount_id = arg4; +abi_long flags = arg5; +struct file_handle *target_fh; +struct file_handle *fh; +int mid = 0; +abi_long ret; +char *name; +unsigned int size, total_size; + +if (is_hostfd(dirfd)) { +return -TARGET_EBADF; +} +if (get_user_s32(size, handle)) { +return -TARGET_EFAULT; +} + +name = lock_user_string(pathname); +if (!name) { +return -TARGET_EFAULT; +} + +total_size = sizeof(struct file_handle) + size; +target_fh = lock_user(VERIFY_WRITE, handle, total_size, 0); +if (!target_fh) { +unlock_user(name, pathname, 0); +return -TARGET_EFAULT; +} + +fh = g_malloc0(total_size); +fh->handle_bytes = size; + +TRY_INTERP_FD(ret, name, + name_to_handle_at(interp_dirfd, name + 1, fh, &mid, flags), + name_to_handle_at(dirfd, name, fh, &mid, flags)); +ret = get_errno(ret); +unlock_user(name, pathname, 0); + +/* man name_to_handle_at(2): + * Other than the use of the handle_bytes field, the caller should treat + * the file_handle structure as an opaque data type + */ + +memcpy(target_fh, fh, total_size); +target_fh->handle_bytes = tswap32(fh->handle_bytes); +target_fh->handle_type = tswap32(fh->handle_type); +g_free(fh); +unlock_user(target_fh, handle, total_size); + +if (put_user_s32(mid, mount_id)) { +return -TARGET_EFAULT; +} +return ret; +} +#endif + #ifdef TARGET_NR_open IMPL(open) { @@ -8248,14 +8252,6 @@ IMPL(everything_else) char *fn; switch(num) { -#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) -case TARGET_NR_name_to_handle_at: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -ret = do_name_to_handle_at(arg1, arg2, arg3, arg4, arg5); -return ret; -#endif #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) case TARGET_NR_open_by_handle_at: if (is_hostfd(arg1)) { @@ -12941,6 +12937,9 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_close] = impl_close, [TARGET_NR_execve] = impl_execve, [TARGET_NR_exit] = impl_exit, +#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) +[TARGET_NR_name_to_handle_at]
[Qemu-devel] [PATCH 16/33] linux-user: Split out link, linkat
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 77 +--- 1 file changed, 43 insertions(+), 34 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index e208f8647a..b5736436f8 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8078,6 +8078,43 @@ IMPL(fork) } #endif +#ifdef TARGET_NR_link +IMPL(link) +{ +char *p1 = lock_user_string(arg1); +char *p2 = lock_user_string(arg2); +abi_long ret = -TARGET_EFAULT; + +if (p1 && p2) { +ret = get_errno(link(p1, p2)); +} +unlock_user(p1, arg1, 0); +unlock_user(p2, arg2, 0); +return ret; +} +#endif + +#if defined(TARGET_NR_linkat) +IMPL(linkat) +{ +char *p1, *p2; +abi_long ret; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +p1 = lock_user_string(arg2); +p2 = lock_user_string(arg4); +ret = -TARGET_EFAULT; +if (p1 && p2) { +ret = get_errno(linkat(arg1, p1, arg3, p2, arg5)); +} +unlock_user(p1, arg2, 0); +unlock_user(p2, arg4, 0); +return ret; +} +#endif + #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) IMPL(name_to_handle_at) { @@ -8315,40 +8352,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_link -case TARGET_NR_link: -{ -void * p2; -p = lock_user_string(arg1); -p2 = lock_user_string(arg2); -if (!p || !p2) -ret = -TARGET_EFAULT; -else -ret = get_errno(link(p, p2)); -unlock_user(p2, arg2, 0); -unlock_user(p, arg1, 0); -} -return ret; -#endif -#if defined(TARGET_NR_linkat) -case TARGET_NR_linkat: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} else { -void * p2 = NULL; -if (!arg2 || !arg4) -return -TARGET_EFAULT; -p = lock_user_string(arg2); -p2 = lock_user_string(arg4); -if (!p || !p2) -ret = -TARGET_EFAULT; -else -ret = get_errno(linkat(arg1, p, arg3, p2, arg5)); -unlock_user(p, arg2, 0); -unlock_user(p2, arg4, 0); -} -return ret; -#endif #ifdef TARGET_NR_unlink case TARGET_NR_unlink: if (!(p = lock_user_string(arg1))) @@ -12958,6 +12961,12 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_fork [TARGET_NR_fork] = impl_fork, #endif +#ifdef TARGET_NR_link +[TARGET_NR_link] = impl_link, +#endif +#if defined(TARGET_NR_linkat) +[TARGET_NR_linkat] = impl_linkat, +#endif #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) [TARGET_NR_name_to_handle_at] = impl_name_to_handle_at, #endif -- 2.17.0
[Qemu-devel] [PATCH 02/33] linux-user: Relax single exit from "break"
Transform outermost "break" to "return ret". If the immediately preceeding statement was an assignment to ret, return the value directly. Signed-off-by: Richard Henderson --- linux-user/syscall.c | 969 +-- 1 file changed, 390 insertions(+), 579 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index ebaefebcc2..258aff0411 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7987,8 +7987,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, Do thread termination if we have more then one thread. */ if (block_signals()) { -ret = -TARGET_ERESTARTSYS; -break; +return -TARGET_ERESTARTSYS; } cpu_list_lock(); @@ -8020,12 +8019,11 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #endif gdb_exit(cpu_env, arg1); _exit(arg1); -ret = 0; /* avoid warning */ -break; +return 0; /* avoid warning */ case TARGET_NR_read: -if (arg3 == 0) -ret = 0; -else { +if (arg3 == 0) { +return 0; +} else { if (is_hostfd(arg1)) { goto ebadf; } @@ -8038,7 +8036,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, } unlock_user(p, arg2, ret); } -break; +return ret; case TARGET_NR_write: if (is_hostfd(arg1)) { goto ebadf; @@ -8057,7 +8055,8 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, ret = get_errno(safe_write(arg1, p, arg3)); } unlock_user(p, arg2, 0); -break; +return ret; + #ifdef TARGET_NR_open case TARGET_NR_open: if (!(p = lock_user_string(arg1))) @@ -8067,7 +8066,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, arg3)); fd_trans_unregister(ret); unlock_user(p, arg1, 0); -break; +return ret; #endif case TARGET_NR_openat: if (is_hostfd(arg1)) { @@ -8080,14 +8079,14 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, arg4)); fd_trans_unregister(ret); unlock_user(p, arg2, 0); -break; +return ret; #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) case TARGET_NR_name_to_handle_at: if (is_hostfd(arg1)) { goto ebadf; } ret = do_name_to_handle_at(arg1, arg2, arg3, arg4, arg5); -break; +return ret; #endif #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) case TARGET_NR_open_by_handle_at: @@ -8096,22 +8095,20 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, } ret = do_open_by_handle_at(arg1, arg2, arg3); fd_trans_unregister(ret); -break; +return ret; #endif case TARGET_NR_close: if (is_hostfd(arg1)) { goto ebadf; } fd_trans_unregister(arg1); -ret = get_errno(close(arg1)); -break; +return get_errno(close(arg1)); + case TARGET_NR_brk: -ret = do_brk(arg1); -break; +return do_brk(arg1); #ifdef TARGET_NR_fork case TARGET_NR_fork: -ret = get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0)); -break; +return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0)); #endif #ifdef TARGET_NR_waitpid case TARGET_NR_waitpid: @@ -8122,7 +8119,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, && put_user_s32(host_to_target_waitstatus(status), arg2)) goto efault; } -break; +return ret; #endif #ifdef TARGET_NR_waitid case TARGET_NR_waitid: @@ -8137,7 +8134,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, unlock_user(p, arg3, sizeof(target_siginfo_t)); } } -break; +return ret; #endif #ifdef TARGET_NR_creat /* not on alpha */ case TARGET_NR_creat: @@ -8146,7 +8143,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, ret = get_errno(creat(p, arg2)); fd_trans_unregister(ret); unlock_user(p, arg1, 0); -break; +return ret; #endif #ifdef TARGET_NR_link case TARGET_NR_link: @@ -8161,7 +8158,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, unlock_user(p2, arg2, 0); unlock_user(p, arg1, 0); } -break; +return ret; #endif #if defined(TARGET_NR_linkat) case TARGET_NR_linkat: @@ -8180,7 +8177,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, unlock_user(p, arg2, 0); unlock_user(p
[Qemu-devel] [PATCH 04/33] linux-user: Propagate goto efault to return
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 311 +-- 1 file changed, 154 insertions(+), 157 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index d0bf650c62..8ea2099001 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8028,7 +8028,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return -TARGET_EBADF; } if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0))) -goto efault; +return -TARGET_EFAULT; ret = get_errno(safe_read(arg1, p, arg3)); if (ret >= 0 && fd_trans_host_to_target_data(arg1)) { @@ -8042,7 +8042,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return -TARGET_EBADF; } if (!(p = lock_user(VERIFY_READ, arg2, arg3, 1))) -goto efault; +return -TARGET_EFAULT; if (fd_trans_target_to_host_data(arg1)) { void *copy = g_malloc(arg3); memcpy(copy, p, arg3); @@ -8060,7 +8060,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #ifdef TARGET_NR_open case TARGET_NR_open: if (!(p = lock_user_string(arg1))) -goto efault; +return -TARGET_EFAULT; ret = get_errno(do_openat(cpu_env, AT_FDCWD, p, target_to_host_bitmask(arg2, fcntl_flags_tbl), arg3)); @@ -8073,7 +8073,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return -TARGET_EBADF; } if (!(p = lock_user_string(arg2))) -goto efault; +return -TARGET_EFAULT; ret = get_errno(do_openat(cpu_env, arg1, p, target_to_host_bitmask(arg3, fcntl_flags_tbl), arg4)); @@ -8117,7 +8117,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, ret = get_errno(safe_wait4(arg1, &status, arg3, 0)); if (!is_error(ret) && arg2 && ret && put_user_s32(host_to_target_waitstatus(status), arg2)) -goto efault; +return -TARGET_EFAULT; } return ret; #endif @@ -8129,7 +8129,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, ret = get_errno(safe_waitid(arg1, arg2, &info, arg4, NULL)); if (!is_error(ret) && arg3 && info.si_pid != 0) { if (!(p = lock_user(VERIFY_WRITE, arg3, sizeof(target_siginfo_t), 0))) -goto efault; +return -TARGET_EFAULT; host_to_target_siginfo(p, &info); unlock_user(p, arg3, sizeof(target_siginfo_t)); } @@ -8139,7 +8139,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #ifdef TARGET_NR_creat /* not on alpha */ case TARGET_NR_creat: if (!(p = lock_user_string(arg1))) -goto efault; +return -TARGET_EFAULT; ret = get_errno(creat(p, arg2)); fd_trans_unregister(ret); unlock_user(p, arg1, 0); @@ -8167,7 +8167,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, } else { void * p2 = NULL; if (!arg2 || !arg4) -goto efault; +return -TARGET_EFAULT; p = lock_user_string(arg2); p2 = lock_user_string(arg4); if (!p || !p2) @@ -8182,7 +8182,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #ifdef TARGET_NR_unlink case TARGET_NR_unlink: if (!(p = lock_user_string(arg1))) -goto efault; +return -TARGET_EFAULT; ret = get_errno(unlink(p)); unlock_user(p, arg1, 0); return ret; @@ -8193,7 +8193,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, return -TARGET_EBADF; } if (!(p = lock_user_string(arg2))) -goto efault; +return -TARGET_EFAULT; ret = get_errno(unlinkat(arg1, p, arg3)); unlock_user(p, arg2, 0); return ret; @@ -8213,7 +8213,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, guest_argp = arg2; for (gp = guest_argp; gp; gp += sizeof(abi_ulong)) { if (get_user_ual(addr, gp)) -goto efault; +return -TARGET_EFAULT; if (!addr) break; argc++; @@ -8222,7 +8222,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, guest_envp = arg3; for (gp = guest_envp; gp; gp += sizeof(abi_ulong)) { if (get_user_ual(addr, gp)) -goto efault; +return -TARGET_EFAULT;
[Qemu-devel] [PATCH 25/33] linux-user: Split out dup, mkdir, mkdirat, rmdir
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 109 +-- 1 file changed, 73 insertions(+), 36 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 24514329b0..36092d753d 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7977,6 +7977,20 @@ IMPL(creat) } #endif +IMPL(dup) +{ +abi_long ret; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +ret = get_errno(dup(arg1)); +if (ret >= 0) { +fd_trans_dup(arg1, ret); +} +return ret; +} + IMPL(execve) { abi_ulong *guest_ptrs; @@ -8249,6 +8263,40 @@ IMPL(lseek) return get_errno(lseek(arg1, arg2, arg3)); } +#ifdef TARGET_NR_mkdir +IMPL(mkdir) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(mkdir(p, arg2)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + +#ifdef TARGET_NR_mkdirat +IMPL(mkdirat) +{ +char *p; +abi_long ret; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +p = lock_user_string(arg2); +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(mkdirat(arg1, p, arg3)); +unlock_user(p, arg2, 0); +return ret; +} +#endif + #ifdef TARGET_NR_mknod IMPL(mknod) { @@ -8558,6 +8606,21 @@ IMPL(renameat2) } #endif +#ifdef TARGET_NR_rmdir +IMPL(rmdir) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(rmdir(p)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + #ifdef TARGET_NR_stime IMPL(stime) { @@ -8768,42 +8831,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_mkdir -case TARGET_NR_mkdir: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(mkdir(p, arg2)); -unlock_user(p, arg1, 0); -return ret; -#endif -#if defined(TARGET_NR_mkdirat) -case TARGET_NR_mkdirat: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -if (!(p = lock_user_string(arg2))) -return -TARGET_EFAULT; -ret = get_errno(mkdirat(arg1, p, arg3)); -unlock_user(p, arg2, 0); -return ret; -#endif -#ifdef TARGET_NR_rmdir -case TARGET_NR_rmdir: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(rmdir(p)); -unlock_user(p, arg1, 0); -return ret; -#endif -case TARGET_NR_dup: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -ret = get_errno(dup(arg1)); -if (ret >= 0) { -fd_trans_dup(arg1, ret); -} -return ret; #ifdef TARGET_NR_pipe case TARGET_NR_pipe: return do_pipe(cpu_env, arg1, 0, 0); @@ -12922,6 +12949,7 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_creat [TARGET_NR_creat] = impl_creat, #endif +[TARGET_NR_dup] = impl_dup, [TARGET_NR_execve] = impl_execve, [TARGET_NR_exit] = impl_exit, #ifdef TARGET_NR_faccessat @@ -12947,6 +12975,12 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_linkat] = impl_linkat, #endif [TARGET_NR_lseek] = impl_lseek, +#ifdef TARGET_NR_mkdir +[TARGET_NR_mkdir] = impl_mkdir, +#endif +#ifdef TARGET_NR_mkdirat +[TARGET_NR_mkdirat] = impl_mkdirat, +#endif #ifdef TARGET_NR_mknod [TARGET_NR_mknod] = impl_mknod, #endif @@ -12980,6 +13014,9 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_renameat2 [TARGET_NR_renameat2] = impl_renameat2, #endif +#ifdef TARGET_NR_rmdir +[TARGET_NR_rmdir] = impl_rmdir, +#endif #ifdef TARGET_NR_stime [TARGET_NR_stime] = impl_stime, #endif -- 2.17.0
[Qemu-devel] [PATCH 15/33] linux-user: Split out creat, fork, waitid, waitpid
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 108 +++ 1 file changed, 69 insertions(+), 39 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 48bb1c0231..e208f8647a 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7908,6 +7908,22 @@ IMPL(close) return get_errno(close(arg1)); } +#ifdef TARGET_NR_creat +IMPL(creat) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(creat(p, arg2)); +fd_trans_unregister(ret); +unlock_user(p, arg1, 0); +return ret; +} +#endif + IMPL(execve) { abi_ulong *guest_ptrs; @@ -8055,6 +8071,13 @@ IMPL(exit) g_assert_not_reached(); } +#ifdef TARGET_NR_fork +IMPL(fork) +{ +return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0)); +} +#endif + #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) IMPL(name_to_handle_at) { @@ -8216,6 +8239,40 @@ IMPL(read) return ret; } +#ifdef TARGET_NR_waitid +IMPL(waitid) +{ +siginfo_t info; +abi_long ret; + +info.si_pid = 0; +ret = get_errno(safe_waitid(arg1, arg2, &info, arg4, NULL)); +if (!is_error(ret) && arg3 && info.si_pid != 0) { +target_siginfo_t *p += lock_user(VERIFY_WRITE, arg3, sizeof(target_siginfo_t), 0); +if (!p) { +return -TARGET_EFAULT; +} +host_to_target_siginfo(p, &info); +unlock_user(p, arg3, sizeof(target_siginfo_t)); +} +return ret; +} +#endif + +#ifdef TARGET_NR_waitpid +IMPL(waitpid) +{ +int status; +abi_long ret = get_errno(safe_wait4(arg1, &status, arg3, 0)); +if (!is_error(ret) && arg2 && ret && +put_user_s32(host_to_target_waitstatus(status), arg2)) { +return -TARGET_EFAULT; +} +return ret; +} +#endif + IMPL(write) { abi_long ret; @@ -8258,45 +8315,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_fork -case TARGET_NR_fork: -return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0)); -#endif -#ifdef TARGET_NR_waitpid -case TARGET_NR_waitpid: -{ -int status; -ret = get_errno(safe_wait4(arg1, &status, arg3, 0)); -if (!is_error(ret) && arg2 && ret -&& put_user_s32(host_to_target_waitstatus(status), arg2)) -return -TARGET_EFAULT; -} -return ret; -#endif -#ifdef TARGET_NR_waitid -case TARGET_NR_waitid: -{ -siginfo_t info; -info.si_pid = 0; -ret = get_errno(safe_waitid(arg1, arg2, &info, arg4, NULL)); -if (!is_error(ret) && arg3 && info.si_pid != 0) { -if (!(p = lock_user(VERIFY_WRITE, arg3, sizeof(target_siginfo_t), 0))) -return -TARGET_EFAULT; -host_to_target_siginfo(p, &info); -unlock_user(p, arg3, sizeof(target_siginfo_t)); -} -} -return ret; -#endif -#ifdef TARGET_NR_creat /* not on alpha */ -case TARGET_NR_creat: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(creat(p, arg2)); -fd_trans_unregister(ret); -unlock_user(p, arg1, 0); -return ret; -#endif #ifdef TARGET_NR_link case TARGET_NR_link: { @@ -12932,8 +12950,14 @@ IMPL(everything_else) static impl_fn * const syscall_table[] = { [TARGET_NR_brk] = impl_brk, [TARGET_NR_close] = impl_close, +#ifdef TARGET_NR_creat +[TARGET_NR_creat] = impl_creat, +#endif [TARGET_NR_execve] = impl_execve, [TARGET_NR_exit] = impl_exit, +#ifdef TARGET_NR_fork +[TARGET_NR_fork] = impl_fork, +#endif #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) [TARGET_NR_name_to_handle_at] = impl_name_to_handle_at, #endif @@ -12945,6 +12969,12 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_open_by_handle_at] = impl_open_by_handle_at, #endif [TARGET_NR_read] = impl_read, +#ifdef TARGET_NR_waitid +[TARGET_NR_waitid] = impl_waitid, +#endif +#ifdef TARGET_NR_waitpid +[TARGET_NR_waitpid] = impl_waitpid, +#endif [TARGET_NR_write] = impl_write, }; -- 2.17.0
[Qemu-devel] [PATCH 14/33] linux-user: Split out open_to_handle_at
At the same time, merge do_open_to_handle_at into the new function. Signed-off-by: Richard Henderson --- linux-user/syscall.c | 84 ++-- 1 file changed, 42 insertions(+), 42 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 4afc22c20c..48bb1c0231 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7369,39 +7369,6 @@ static int do_futex(target_ulong uaddr, int op, int val, target_ulong timeout, return -TARGET_ENOSYS; } } -#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) -static abi_long do_open_by_handle_at(abi_long mount_fd, abi_long handle, - abi_long flags) -{ -struct file_handle *target_fh; -struct file_handle *fh; -unsigned int size, total_size; -abi_long ret; - -if (get_user_s32(size, handle)) { -return -TARGET_EFAULT; -} - -total_size = sizeof(struct file_handle) + size; -target_fh = lock_user(VERIFY_READ, handle, total_size, 1); -if (!target_fh) { -return -TARGET_EFAULT; -} - -fh = g_memdup(target_fh, total_size); -fh->handle_bytes = size; -fh->handle_type = tswap32(target_fh->handle_type); - -ret = get_errno(open_by_handle_at(mount_fd, fh, -target_to_host_bitmask(flags, fcntl_flags_tbl))); - -g_free(fh); - -unlock_user(target_fh, handle, total_size); - -return ret; -} -#endif #if defined(TARGET_NR_signalfd) || defined(TARGET_NR_signalfd4) @@ -8187,6 +8154,45 @@ IMPL(openat) return ret; } +#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) +IMPL(open_by_handle_at) +{ +abi_long mount_fd = arg1; +abi_long handle = arg2; +abi_long flags = arg3; +struct file_handle *target_fh; +struct file_handle *fh; +unsigned int size, total_size; +abi_long ret; + +if (is_hostfd(mount_fd)) { +return -TARGET_EBADF; +} +if (get_user_s32(size, handle)) { +return -TARGET_EFAULT; +} + +total_size = sizeof(struct file_handle) + size; +target_fh = lock_user(VERIFY_READ, handle, total_size, 1); +if (!target_fh) { +return -TARGET_EFAULT; +} + +fh = g_memdup(target_fh, total_size); +fh->handle_bytes = size; +fh->handle_type = tswap32(target_fh->handle_type); + +ret = get_errno(open_by_handle_at(mount_fd, fh, +target_to_host_bitmask(flags, fcntl_flags_tbl))); + +g_free(fh); +unlock_user(target_fh, handle, total_size); + +fd_trans_unregister(ret); +return ret; +} +#endif + IMPL(read) { abi_long ret; @@ -8252,15 +8258,6 @@ IMPL(everything_else) char *fn; switch(num) { -#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) -case TARGET_NR_open_by_handle_at: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -ret = do_open_by_handle_at(arg1, arg2, arg3); -fd_trans_unregister(ret); -return ret; -#endif #ifdef TARGET_NR_fork case TARGET_NR_fork: return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0)); @@ -12944,6 +12941,9 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_open] = impl_open, #endif [TARGET_NR_openat] = impl_openat, +#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) +[TARGET_NR_open_by_handle_at] = impl_open_by_handle_at, +#endif [TARGET_NR_read] = impl_read, [TARGET_NR_write] = impl_write, }; -- 2.17.0
[Qemu-devel] [PATCH 17/33] linux-user: Split out unlink, unlinkat
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 59 ++-- 1 file changed, 40 insertions(+), 19 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index b5736436f8..bbe9d6d9fb 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8276,6 +8276,40 @@ IMPL(read) return ret; } +#ifdef TARGET_NR_unlink +IMPL(unlink) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(unlink(p)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + +#ifdef TARGET_NR_unlinkat +IMPL(unlinkat) +{ +char *p; +abi_long ret; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +p = lock_user_string(arg2); +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(unlinkat(arg1, p, arg3)); +unlock_user(p, arg2, 0); +return ret; +} +#endif + #ifdef TARGET_NR_waitid IMPL(waitid) { @@ -8352,25 +8386,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_unlink -case TARGET_NR_unlink: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(unlink(p)); -unlock_user(p, arg1, 0); -return ret; -#endif -#if defined(TARGET_NR_unlinkat) -case TARGET_NR_unlinkat: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -if (!(p = lock_user_string(arg2))) -return -TARGET_EFAULT; -ret = get_errno(unlinkat(arg1, p, arg3)); -unlock_user(p, arg2, 0); -return ret; -#endif case TARGET_NR_chdir: if (!(p = lock_user_string(arg1))) return -TARGET_EFAULT; @@ -12978,6 +12993,12 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_open_by_handle_at] = impl_open_by_handle_at, #endif [TARGET_NR_read] = impl_read, +#ifdef TARGET_NR_unlink +[TARGET_NR_unlink] = impl_unlink, +#endif +#if TARGET_NR_unlinkat +[TARGET_NR_unlinkat] = impl_unlinkat, +#endif #ifdef TARGET_NR_waitid [TARGET_NR_waitid] = impl_waitid, #endif -- 2.17.0
[Qemu-devel] [PATCH 29/33] linux-user: Split out getpgrp, getppid, setsid
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 4d9b9cad6e..3dfb77ac11 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8182,6 +8182,13 @@ IMPL(futimesat) } #endif +#ifdef TARGET_NR_getpgrp +IMPL(getpgrp) +{ +return get_errno(getpgrp()); +} +#endif + #ifdef TARGET_NR_getpid IMPL(getpid) { @@ -8189,6 +8196,13 @@ IMPL(getpid) } #endif +#ifdef TARGET_NR_getppid +IMPL(getppid) +{ +return get_errno(getppid()); +} +#endif + #if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA) IMPL(getxpid) { @@ -8721,6 +8735,11 @@ IMPL(setpgid) return get_errno(setpgid(arg1, arg2)); } +IMPL(setsid) +{ +return get_errno(setsid()); +} + #ifdef TARGET_NR_stime IMPL(stime) { @@ -8972,16 +8991,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_getppid /* not on alpha */ -case TARGET_NR_getppid: -return get_errno(getppid()); -#endif -#ifdef TARGET_NR_getpgrp -case TARGET_NR_getpgrp: -return get_errno(getpgrp()); -#endif -case TARGET_NR_setsid: -return get_errno(setsid()); #ifdef TARGET_NR_sigaction case TARGET_NR_sigaction: { @@ -13020,9 +13029,15 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_futimesat [TARGET_NR_futimesat] = impl_futimesat, #endif +#ifdef TARGET_NR_getpgrp +[TARGET_NR_getpgrp] = impl_getpgrp, +#endif #ifdef TARGET_NR_getpid [TARGET_NR_getpid] = impl_getpid, #endif +#ifdef TARGET_NR_getppid +[TARGET_NR_getppid] = impl_getppid, +#endif #if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA) [TARGET_NR_getxpid] = impl_getxpid, #endif @@ -13084,6 +13099,7 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_rmdir] = impl_rmdir, #endif [TARGET_NR_setpgid] = impl_setpgid, +[TARGET_NR_setsid] = impl_setsid, #ifdef TARGET_NR_stime [TARGET_NR_stime] = impl_stime, #endif -- 2.17.0
[Qemu-devel] [PATCH 19/33] linux-user: Remove all unimplemented entries
There is no reason to list these, since -ENOSYS is the default. Signed-off-by: Richard Henderson --- linux-user/syscall.c | 140 --- 1 file changed, 140 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 88e0da31ba..6a701ea8f6 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8460,14 +8460,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_break -case TARGET_NR_break: -return do_unimplemented(num); -#endif -#ifdef TARGET_NR_oldstat -case TARGET_NR_oldstat: -return do_unimplemented(num); -#endif case TARGET_NR_lseek: if (is_hostfd(arg1)) { return -TARGET_EBADF; @@ -8555,16 +8547,10 @@ IMPL(everything_else) return get_errno(stime(&host_time)); } #endif -case TARGET_NR_ptrace: -return do_unimplemented(num); #ifdef TARGET_NR_alarm /* not on alpha */ case TARGET_NR_alarm: return alarm(arg1); #endif -#ifdef TARGET_NR_oldfstat -case TARGET_NR_oldfstat: -return do_unimplemented(num); -#endif #ifdef TARGET_NR_pause /* not on alpha */ case TARGET_NR_pause: if (!block_signals()) { @@ -8640,14 +8626,6 @@ IMPL(everything_else) } return ret; #endif -#ifdef TARGET_NR_stty -case TARGET_NR_stty: -return do_unimplemented(num); -#endif -#ifdef TARGET_NR_gtty -case TARGET_NR_gtty: -return do_unimplemented(num); -#endif #ifdef TARGET_NR_access case TARGET_NR_access: if (!(fn = lock_user_string(arg1))) { @@ -8678,10 +8656,6 @@ IMPL(everything_else) #ifdef TARGET_NR_nice /* not on alpha */ case TARGET_NR_nice: return get_errno(nice(arg1)); -#endif -#ifdef TARGET_NR_ftime -case TARGET_NR_ftime: -return do_unimplemented(num); #endif case TARGET_NR_sync: sync(); @@ -8805,14 +8779,6 @@ IMPL(everything_else) ret = host_to_target_clock_t(ret); } return ret; -#ifdef TARGET_NR_prof -case TARGET_NR_prof: -return do_unimplemented(num); -#endif -#ifdef TARGET_NR_signal -case TARGET_NR_signal: -return do_unimplemented(num); -#endif case TARGET_NR_acct: if (arg1 == 0) { ret = get_errno(acct(NULL)); @@ -8832,31 +8798,15 @@ IMPL(everything_else) ret = get_errno(umount2(p, arg2)); unlock_user(p, arg1, 0); return ret; -#endif -#ifdef TARGET_NR_lock -case TARGET_NR_lock: -return do_unimplemented(num); #endif case TARGET_NR_ioctl: return do_ioctl(arg1, arg2, arg3); #ifdef TARGET_NR_fcntl case TARGET_NR_fcntl: return do_fcntl(arg1, arg2, arg3); -#endif -#ifdef TARGET_NR_mpx -case TARGET_NR_mpx: -return do_unimplemented(num); #endif case TARGET_NR_setpgid: return get_errno(setpgid(arg1, arg2)); -#ifdef TARGET_NR_ulimit -case TARGET_NR_ulimit: -return do_unimplemented(num); -#endif -#ifdef TARGET_NR_oldolduname -case TARGET_NR_oldolduname: -return do_unimplemented(num); -#endif case TARGET_NR_umask: return get_errno(umask(arg1)); case TARGET_NR_chroot: @@ -8865,10 +8815,6 @@ IMPL(everything_else) ret = get_errno(chroot(p)); unlock_user(p, arg1, 0); return ret; -#ifdef TARGET_NR_ustat -case TARGET_NR_ustat: -return do_unimplemented(num); -#endif #ifdef TARGET_NR_dup2 case TARGET_NR_dup2: if (is_hostfd(arg1) || is_hostfd(arg2)) { @@ -9585,10 +9531,6 @@ IMPL(everything_else) } return ret; #endif -#ifdef TARGET_NR_oldlstat -case TARGET_NR_oldlstat: -return do_unimplemented(num); -#endif #ifdef TARGET_NR_readlink case TARGET_NR_readlink: { @@ -9650,10 +9592,6 @@ IMPL(everything_else) } return ret; #endif -#ifdef TARGET_NR_uselib -case TARGET_NR_uselib: -return do_unimplemented(num); -#endif #ifdef TARGET_NR_swapon case TARGET_NR_swapon: if (!(p = lock_user_string(arg1))) @@ -9675,10 +9613,6 @@ IMPL(everything_else) ret = get_errno(reboot(arg1, arg2, arg3, NULL)); } return ret; -#ifdef TARGET_NR_readdir -case TARGET_NR_readdir: -return do_unimplemented(num); -#endif #ifdef TARGET_NR_mmap case TARGET_NR_mmap: #if (defined(TARGET_I386) && defined(TARGET_ABI32)) || \ @@ -9813,10 +9747,6 @@ IMPL(everything_else) return ret; case TARGET_NR_setpriority: return get_errno(setpriority(arg1, arg2, arg3)); -#ifdef TARGET_NR_profil -case TARGET_NR_profil: -return do_unimplemented(num); -#endif case TARGET_NR_statfs: if (!(fn = lock_user_string(arg1))) { return -TARGET_EFAULT; @@ -9892,10 +9822,6 @@ IMPL(everything_else) ret = get_errno(fstatfs(arg1, &stfs)); goto convert_statfs64; #endif -#ifdef TARGET_NR_ioperm -case TARGET_NR_ioperm:
[Qemu-devel] [PATCH 18/33] linux-user: Split out chdir, mknod, mknodat, time, chmod
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 132 --- 1 file changed, 87 insertions(+), 45 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index bbe9d6d9fb..88e0da31ba 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7899,6 +7899,34 @@ IMPL(brk) return do_brk(arg1); } +IMPL(chdir) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(chdir(p)); +unlock_user(p, arg1, 0); +return ret; +} + +#ifdef TARGET_NR_chmod +IMPL(chmod) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(chmod(p, arg2)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + IMPL(close) { if (is_hostfd(arg1)) { @@ -8115,6 +8143,40 @@ IMPL(linkat) } #endif +#ifdef TARGET_NR_mknod +IMPL(mknod) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(mknod(p, arg2, arg3)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + +#ifdef TARGET_NR_mknodat +IMPL(mknodat) +{ +char *p; +abi_long ret; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +p = lock_user_string(arg2); +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(mknodat(arg1, p, arg3, arg4)); +unlock_user(p, arg2, 0); +return ret; +} +#endif + #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) IMPL(name_to_handle_at) { @@ -8276,6 +8338,18 @@ IMPL(read) return ret; } +#ifdef TARGET_NR_time +IMPL(time) +{ +time_t host_time; +abi_long ret = get_errno(time(&host_time)); +if (!is_error(ret) && arg1 && put_user_sal(host_time, arg1)) { +return -TARGET_EFAULT; +} +return ret; +} +#endif + #ifdef TARGET_NR_unlink IMPL(unlink) { @@ -8386,51 +8460,6 @@ IMPL(everything_else) char *fn; switch(num) { -case TARGET_NR_chdir: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(chdir(p)); -unlock_user(p, arg1, 0); -return ret; -#ifdef TARGET_NR_time -case TARGET_NR_time: -{ -time_t host_time; -ret = get_errno(time(&host_time)); -if (!is_error(ret) -&& arg1 -&& put_user_sal(host_time, arg1)) -return -TARGET_EFAULT; -} -return ret; -#endif -#ifdef TARGET_NR_mknod -case TARGET_NR_mknod: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(mknod(p, arg2, arg3)); -unlock_user(p, arg1, 0); -return ret; -#endif -#if defined(TARGET_NR_mknodat) -case TARGET_NR_mknodat: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -if (!(p = lock_user_string(arg2))) -return -TARGET_EFAULT; -ret = get_errno(mknodat(arg1, p, arg3, arg4)); -unlock_user(p, arg2, 0); -return ret; -#endif -#ifdef TARGET_NR_chmod -case TARGET_NR_chmod: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(chmod(p, arg2)); -unlock_user(p, arg1, 0); -return ret; -#endif #ifdef TARGET_NR_break case TARGET_NR_break: return do_unimplemented(num); @@ -12968,6 +12997,10 @@ IMPL(everything_else) static impl_fn * const syscall_table[] = { [TARGET_NR_brk] = impl_brk, [TARGET_NR_close] = impl_close, +[TARGET_NR_chdir] = impl_chdir, +#ifdef TARGET_NR_chmod +[TARGET_NR_chmod] = impl_chmod, +#endif #ifdef TARGET_NR_creat [TARGET_NR_creat] = impl_creat, #endif @@ -12982,6 +13015,12 @@ static impl_fn * const syscall_table[] = { #if defined(TARGET_NR_linkat) [TARGET_NR_linkat] = impl_linkat, #endif +#ifdef TARGET_NR_mknod +[TARGET_NR_mknod] = impl_mknod, +#endif +#ifdef TARGET_NR_mknodat +[TARGET_NR_mknodat] = impl_mknodat, +#endif #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) [TARGET_NR_name_to_handle_at] = impl_name_to_handle_at, #endif @@ -12993,6 +13032,9 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_open_by_handle_at] = impl_open_by_handle_at, #endif [TARGET_NR_read] = impl_read, +#ifdef TARGET_NR_time +[TARGET_NR_time] = impl_time, +#endif #ifdef TARGET_NR_unlink [TARGET_NR_unlink] = impl_unlink, #endif -- 2.17.0
[Qemu-devel] [PATCH 31/33] linux-user: Split out rt_sigprocmask, sgetmask, sigprocmask, ssetmask
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 294 +++ 1 file changed, 158 insertions(+), 136 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 36e2bb838e..e37a3ab643 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8805,6 +8805,65 @@ IMPL(rt_sigaction) return ret; } +IMPL(rt_sigprocmask) +{ +int how = 0; +sigset_t set, oldset, *set_ptr = NULL; +abi_long ret; +target_sigset_t *p; + +if (arg4 != sizeof(target_sigset_t)) { +return -TARGET_EINVAL; +} + +if (arg2) { +switch (arg1) { +case TARGET_SIG_BLOCK: +how = SIG_BLOCK; +break; +case TARGET_SIG_UNBLOCK: +how = SIG_UNBLOCK; +break; +case TARGET_SIG_SETMASK: +how = SIG_SETMASK; +break; +default: +return -TARGET_EINVAL; +} +p = lock_user(VERIFY_READ, arg2, sizeof(target_sigset_t), 1); +if (!p) { +return -TARGET_EFAULT; +} +target_to_host_sigset(&set, p); +unlock_user(p, arg2, 0); +set_ptr = &set; +} +ret = do_sigprocmask(how, set_ptr, &oldset); +if (!is_error(ret) && arg3) { +p = lock_user(VERIFY_WRITE, arg3, sizeof(target_sigset_t), 0); +if (!p) { +return -TARGET_EFAULT; +} +host_to_target_sigset(p, &oldset); +unlock_user(p, arg3, sizeof(target_sigset_t)); +} +return ret; +} + +#ifdef TARGET_NR_sgetmask +IMPL(sgetmask) +{ +sigset_t cur_set; +abi_ulong target_set; +abi_long ret = do_sigprocmask(0, NULL, &cur_set); +if (!ret) { +host_to_target_old_sigset(&target_set, &cur_set); +ret = target_set; +} +return ret; +} +#endif + IMPL(setpgid) { return get_errno(setpgid(arg1, arg2)); @@ -8901,6 +8960,95 @@ IMPL(sigaction) } #endif +#ifdef TARGET_NR_sigprocmask +IMPL(sigprocmask) +{ +abi_long ret; +# ifdef TARGET_ALPHA +sigset_t set, oldset; +abi_ulong mask; +int how; + +switch (arg1) { +case TARGET_SIG_BLOCK: +how = SIG_BLOCK; +break; +case TARGET_SIG_UNBLOCK: +how = SIG_UNBLOCK; +break; +case TARGET_SIG_SETMASK: +how = SIG_SETMASK; +break; +default: +return -TARGET_EINVAL; +} +mask = arg2; +target_to_host_old_sigset(&set, &mask); + +ret = do_sigprocmask(how, &set, &oldset); +if (!is_error(ret)) { +host_to_target_old_sigset(&mask, &oldset); +ret = mask; +((CPUAlphaState *)cpu_env)->ir[IR_V0] = 0; /* force no error */ +} +# else +sigset_t set, oldset, *set_ptr = NULL; +int how = 0; +abi_ulong *p; + +if (arg2) { +switch (arg1) { +case TARGET_SIG_BLOCK: +how = SIG_BLOCK; +break; +case TARGET_SIG_UNBLOCK: +how = SIG_UNBLOCK; +break; +case TARGET_SIG_SETMASK: +how = SIG_SETMASK; +break; +default: +return -TARGET_EINVAL; +} +p = lock_user(VERIFY_READ, arg2, sizeof(target_sigset_t), 1); +if (!p) { +return -TARGET_EFAULT; +} +target_to_host_old_sigset(&set, p); +unlock_user(p, arg2, 0); +set_ptr = &set; +} +ret = do_sigprocmask(how, set_ptr, &oldset); +if (!is_error(ret) && arg3) { +p = lock_user(VERIFY_WRITE, arg3, sizeof(target_sigset_t), 0); +if (!p) { +return -TARGET_EFAULT; +} +host_to_target_old_sigset(p, &oldset); +unlock_user(p, arg3, sizeof(target_sigset_t)); +} +# endif +return ret; +} +#endif + +#ifdef TARGET_NR_ssetmask +IMPL(ssetmask) +{ +sigset_t set, oset; +abi_ulong target_set = arg1; +abi_long ret; + +target_to_host_old_sigset(&set, &target_set); +ret = do_sigprocmask(SIG_SETMASK, &set, &oset); +if (!ret) { +host_to_target_old_sigset(&target_set, &oset); +ret = target_set; +} +return ret; +} +#endif + #ifdef TARGET_NR_stime IMPL(stime) { @@ -9152,142 +9300,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_sgetmask /* not on alpha */ -case TARGET_NR_sgetmask: -{ -sigset_t cur_set; -abi_ulong target_set; -ret = do_sigprocmask(0, NULL, &cur_set); -if (!ret) { -host_to_target_old_sigset(&target_set, &cur_set); -ret = target_set; -} -} -return ret; -#endif -#ifdef TARGET_NR_ssetmask /* not on alpha */ -case TARGET_NR_ssetmask: -{ -sigset_t set, oset; -abi_ulong target_set = arg1; -target_to_host_old_sigset(&set, &target_set); -ret = do_sigprocmask(SIG_SETMASK, &set, &oset); -if (!ret) { -host_to_target_old_sigset(&t
[Qemu-devel] [PATCH 23/33] linux-user: Split out access, faccessat, futimesat, kill, nice, sync, syncfs
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 179 +++ 1 file changed, 113 insertions(+), 66 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index b3838c5161..2a172e24eb 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7894,6 +7894,24 @@ IMPL(enosys) return do_unimplemented(num); } +#ifdef TARGET_NR_access +IMPL(access) +{ +char *fn = lock_user_string(arg1); +abi_long ret; + +if (!fn) { +return -TARGET_EFAULT; +} +TRY_INTERP_FD(ret, fn, + faccessat(interp_dirfd, fn + 1, arg2, 0), + access(fn, arg2)); +ret = get_errno(ret); +unlock_user(fn, arg1, 0); +return ret; +} +#endif + #ifdef TARGET_NR_alarm IMPL(alarm) { @@ -8106,6 +8124,28 @@ IMPL(exit) g_assert_not_reached(); } +#ifdef TARGET_NR_faccessat +IMPL(faccessat) +{ +char *fn; +abi_long ret; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +fn = lock_user_string(arg2); +if (!fn) { +return -TARGET_EFAULT; +} +TRY_INTERP_FD(ret, fn, + faccessat(interp_dirfd, fn + 1, arg3, 0), + faccessat(arg1, fn, arg3, 0)); +ret = get_errno(ret); +unlock_user(fn, arg2, 0); +return ret; +} +#endif + #ifdef TARGET_NR_fork IMPL(fork) { @@ -8113,6 +8153,37 @@ IMPL(fork) } #endif +#ifdef TARGET_NR_futimesat +IMPL(futimesat) +{ +struct timeval tv[2], *tvp = NULL; +char *fn; +abi_long ret; + +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +if (arg3) { +if (copy_from_user_timeval(&tv[0], arg3) || +copy_from_user_timeval(&tv[1], + arg3 + sizeof(struct target_timeval))) { +return -TARGET_EFAULT; +} +tvp = tv; +} +fn = lock_user_string(arg2); +if (!fn) { +return -TARGET_EFAULT; +} +TRY_INTERP_FD(ret, fn, + futimesat(interp_dirfd, fn + 1, tvp), + futimesat(arg1, fn, tvp)); +ret = get_errno(ret); +unlock_user(fn, arg2, 0); +return ret; +} +#endif + #ifdef TARGET_NR_getpid IMPL(getpid) { @@ -8128,6 +8199,11 @@ IMPL(getxpid) } #endif +IMPL(kill) +{ +return get_errno(safe_kill(arg1, target_to_host_signal(arg2))); +} + #ifdef TARGET_NR_link IMPL(link) { @@ -8309,6 +8385,13 @@ IMPL(name_to_handle_at) } #endif +#ifdef TARGET_NR_nice +IMPL(nice) +{ +return get_errno(nice(arg1)); +} +#endif + #ifdef TARGET_NR_open IMPL(open) { @@ -8432,6 +8515,19 @@ IMPL(stime) } #endif +IMPL(sync) +{ +sync(); +return 0; +} + +#if defined(TARGET_NR_syncfs) && defined(CONFIG_SYNCFS) +IMPL(syncfs) +{ +return get_errno(syncfs(arg1)); +} +#endif + #ifdef TARGET_NR_time IMPL(time) { @@ -8618,72 +8714,6 @@ IMPL(everything_else) char *fn; switch(num) { -#if defined(TARGET_NR_futimesat) -case TARGET_NR_futimesat: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} else { -struct timeval *tvp, tv[2]; -if (arg3) { -if (copy_from_user_timeval(&tv[0], arg3) -|| copy_from_user_timeval(&tv[1], - arg3 + sizeof(struct target_timeval))) -return -TARGET_EFAULT; -tvp = tv; -} else { -tvp = NULL; -} -if (!(fn = lock_user_string(arg2))) { -return -TARGET_EFAULT; -} -TRY_INTERP_FD(ret, fn, - futimesat(interp_dirfd, fn + 1, tvp), - futimesat(arg1, fn, tvp)); -ret = get_errno(ret); -unlock_user(fn, arg2, 0); -} -return ret; -#endif -#ifdef TARGET_NR_access -case TARGET_NR_access: -if (!(fn = lock_user_string(arg1))) { -return -TARGET_EFAULT; -} -TRY_INTERP_FD(ret, fn, - faccessat(interp_dirfd, fn + 1, arg2, 0), - access(fn, arg2)); -ret = get_errno(ret); -unlock_user(fn, arg1, 0); -return ret; -#endif -#if defined(TARGET_NR_faccessat) && defined(__NR_faccessat) -case TARGET_NR_faccessat: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -if (!(fn = lock_user_string(arg2))) { -return -TARGET_EFAULT; -} -TRY_INTERP_FD(ret, fn, - faccessat(interp_dirfd, fn + 1, arg3, 0), - faccessat(arg1, fn, arg3, 0)); -ret = get_errno(ret); -unlock_user(fn, arg2, 0); -return ret; -#endif -#ifdef TARGET_NR_nice /* not on alpha */ -case TARGET_NR_nice: -return get_errno(nice(arg1)); -#endif -case TARGET_NR_sync: -sync(); -return 0; -#if defined(TARGET_NR_syncfs) && defined(CONFIG_SYNCFS) -case TARGET_NR_sync
[Qemu-devel] [PATCH 21/33] linux-user: Split out mount, umount
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 123 +-- 1 file changed, 60 insertions(+), 63 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index b568144369..53eac58ec0 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8200,6 +8200,47 @@ IMPL(mknodat) } #endif +IMPL(mount) +{ +char *p1 = NULL, *p2, *p3 = NULL; +abi_long ret = -TARGET_EFAULT; + +if (arg1) { +p1 = lock_user_string(arg1); +if (!p1) { +goto exit1; +} +} +p2 = lock_user_string(arg2); +if (!p2) { +goto exit2; +} +if (arg3) { +p3 = lock_user_string(arg3); +if (!p3) { +goto exit3; +} +} + +/* FIXME - arg5 should be locked, but it isn't clear how to do that + * since it's not guaranteed to be a NULL-terminated string. + */ +ret = mount(p1, p2, p3, (unsigned long)arg4, arg5 ? g2h(arg5) : NULL); +ret = get_errno(ret); + +if (arg3) { +unlock_user(p3, arg3, 0); +} + exit3: +unlock_user(p2, arg2, 0); + exit2: +if (arg1) { +unlock_user(p1, arg1, 0); +} + exit1: +return ret; +} + #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) IMPL(name_to_handle_at) { @@ -8373,6 +8414,21 @@ IMPL(time) } #endif +#ifdef TARGET_NR_umount +IMPL(umount) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(umount(p)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + #ifdef TARGET_NR_unlink IMPL(unlink) { @@ -8483,69 +8539,6 @@ IMPL(everything_else) char *fn; switch(num) { -case TARGET_NR_mount: -{ -/* need to look at the data field */ -void *p2, *p3; - -if (arg1) { -p = lock_user_string(arg1); -if (!p) { -return -TARGET_EFAULT; -} -} else { -p = NULL; -} - -p2 = lock_user_string(arg2); -if (!p2) { -if (arg1) { -unlock_user(p, arg1, 0); -} -return -TARGET_EFAULT; -} - -if (arg3) { -p3 = lock_user_string(arg3); -if (!p3) { -if (arg1) { -unlock_user(p, arg1, 0); -} -unlock_user(p2, arg2, 0); -return -TARGET_EFAULT; -} -} else { -p3 = NULL; -} - -/* FIXME - arg5 should be locked, but it isn't clear how to - * do that since it's not guaranteed to be a NULL-terminated - * string. - */ -if (!arg5) { -ret = mount(p, p2, p3, (unsigned long)arg4, NULL); -} else { -ret = mount(p, p2, p3, (unsigned long)arg4, g2h(arg5)); -} -ret = get_errno(ret); - -if (arg1) { -unlock_user(p, arg1, 0); -} -unlock_user(p2, arg2, 0); -if (arg3) { -unlock_user(p3, arg3, 0); -} -} -return ret; -#ifdef TARGET_NR_umount -case TARGET_NR_umount: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(umount(p)); -unlock_user(p, arg1, 0); -return ret; -#endif #ifdef TARGET_NR_stime /* not on alpha */ case TARGET_NR_stime: { @@ -12896,6 +12889,7 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_mknodat [TARGET_NR_mknodat] = impl_mknodat, #endif +[TARGET_NR_mount] = impl_mount, #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE) [TARGET_NR_name_to_handle_at] = impl_name_to_handle_at, #endif @@ -12910,6 +12904,9 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_time [TARGET_NR_time] = impl_time, #endif +#ifdef TARGET_NR_umount +[TARGET_NR_umount] = impl_umount, +#endif #ifdef TARGET_NR_unlink [TARGET_NR_unlink] = impl_unlink, #endif -- 2.17.0
[Qemu-devel] [PATCH 20/33] linux-user: Split out getpid, getxpid, lseek
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 45 +--- 1 file changed, 30 insertions(+), 15 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 6a701ea8f6..b568144369 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8106,6 +8106,21 @@ IMPL(fork) } #endif +#ifdef TARGET_NR_getpid +IMPL(getpid) +{ +return get_errno(getpid()); +} +#endif + +#if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA) +IMPL(getxpid) +{ +((CPUAlphaState *)cpu_env)->ir[IR_A4] = getppid(); +return get_errno(getpid()); +} +#endif + #ifdef TARGET_NR_link IMPL(link) { @@ -8143,6 +8158,14 @@ IMPL(linkat) } #endif +IMPL(lseek) +{ +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} +return get_errno(lseek(arg1, arg2, arg3)); +} + #ifdef TARGET_NR_mknod IMPL(mknod) { @@ -8460,21 +8483,6 @@ IMPL(everything_else) char *fn; switch(num) { -case TARGET_NR_lseek: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} -return get_errno(lseek(arg1, arg2, arg3)); -#if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA) -/* Alpha specific */ -case TARGET_NR_getxpid: -((CPUAlphaState *)cpu_env)->ir[IR_A4] = getppid(); -return get_errno(getpid()); -#endif -#ifdef TARGET_NR_getpid -case TARGET_NR_getpid: -return get_errno(getpid()); -#endif case TARGET_NR_mount: { /* need to look at the data field */ @@ -12869,12 +12877,19 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_fork [TARGET_NR_fork] = impl_fork, #endif +#ifdef TARGET_NR_getpid +[TARGET_NR_getpid] = impl_getpid, +#endif +#if defined(TARGET_NR_getxpid) && defined(TARGET_ALPHA) +[TARGET_NR_getxpid] = impl_getxpid, +#endif #ifdef TARGET_NR_link [TARGET_NR_link] = impl_link, #endif #if defined(TARGET_NR_linkat) [TARGET_NR_linkat] = impl_linkat, #endif +[TARGET_NR_lseek] = impl_lseek, #ifdef TARGET_NR_mknod [TARGET_NR_mknod] = impl_mknod, #endif -- 2.17.0
[Qemu-devel] [PATCH 32/33] linux-user: Split out rt_sigpending, rt_sigsuspend, sigpending, sigsuspend
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 176 +-- 1 file changed, 101 insertions(+), 75 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index e37a3ab643..c3bd625965 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8805,6 +8805,32 @@ IMPL(rt_sigaction) return ret; } +IMPL(rt_sigpending) +{ +sigset_t set; +abi_long ret; + +/* Yes, this check is >, not != like most. We follow the kernel's + * logic and it does it like this because it implements + * NR_sigpending through the same code path, and in that case + * the old_sigset_t is smaller in size. + */ +if (arg2 > sizeof(target_sigset_t)) { +return -TARGET_EINVAL; +} +ret = get_errno(sigpending(&set)); +if (!is_error(ret)) { +target_sigset_t *p; +p = lock_user(VERIFY_WRITE, arg1, sizeof(target_sigset_t), 0); +if (!p) { +return -TARGET_EFAULT; +} +host_to_target_sigset(p, &set); +unlock_user(p, arg1, sizeof(target_sigset_t)); +} +return ret; +} + IMPL(rt_sigprocmask) { int how = 0; @@ -8850,6 +8876,29 @@ IMPL(rt_sigprocmask) return ret; } +IMPL(rt_sigsuspend) +{ +CPUState *cpu = ENV_GET_CPU(cpu_env); +TaskState *ts = cpu->opaque; +target_sigset_t *p; +abi_long ret; + +if (arg2 != sizeof(target_sigset_t)) { +return -TARGET_EINVAL; +} +p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 1); +if (!p) { +return -TARGET_EFAULT; +} +target_to_host_sigset(&ts->sigsuspend_mask, p); +unlock_user(p, arg1, 0); +ret = get_errno(safe_rt_sigsuspend(&ts->sigsuspend_mask, SIGSET_T_SIZE)); +if (ret != -TARGET_ERESTARTSYS) { +ts->in_sigsuspend = 1; +} +return ret; +} + #ifdef TARGET_NR_sgetmask IMPL(sgetmask) { @@ -8960,6 +9009,24 @@ IMPL(sigaction) } #endif +#ifdef TARGET_NR_sigpending +IMPL(sigpending) +{ +sigset_t set; +abi_long ret = get_errno(sigpending(&set)); +if (!is_error(ret)) { +abi_ulong *p; +p = lock_user(VERIFY_WRITE, arg1, sizeof(target_sigset_t), 0); +if (!p) { +return -TARGET_EFAULT; +} +host_to_target_old_sigset(p, &set); +unlock_user(p, arg1, sizeof(target_sigset_t)); +} +return ret; +} +#endif + #ifdef TARGET_NR_sigprocmask IMPL(sigprocmask) { @@ -9032,6 +9099,32 @@ IMPL(sigprocmask) } #endif +#ifdef TARGET_NR_sigsuspend +IMPL(sigsuspend) +{ +CPUState *cpu = ENV_GET_CPU(cpu_env); +TaskState *ts = cpu->opaque; +abi_long ret; + +# ifdef TARGET_ALPHA +abi_ulong mask = arg1; +target_to_host_old_sigset(&ts->sigsuspend_mask, &mask); +# else +abi_ulong *p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 1); +if (!p) { +return -TARGET_EFAULT; +} +target_to_host_old_sigset(&ts->sigsuspend_mask, p); +unlock_user(p, arg1, 0); +# endif +ret = get_errno(safe_rt_sigsuspend(&ts->sigsuspend_mask, SIGSET_T_SIZE)); +if (ret != -TARGET_ERESTARTSYS) { +ts->in_sigsuspend = 1; +} +return ret; +} +#endif + #ifdef TARGET_NR_ssetmask IMPL(ssetmask) { @@ -9300,81 +9393,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_sigpending -case TARGET_NR_sigpending: -{ -sigset_t set; -ret = get_errno(sigpending(&set)); -if (!is_error(ret)) { -if (!(p = lock_user(VERIFY_WRITE, arg1, sizeof(target_sigset_t), 0))) -return -TARGET_EFAULT; -host_to_target_old_sigset(p, &set); -unlock_user(p, arg1, sizeof(target_sigset_t)); -} -} -return ret; -#endif -case TARGET_NR_rt_sigpending: -{ -sigset_t set; - -/* Yes, this check is >, not != like most. We follow the kernel's - * logic and it does it like this because it implements - * NR_sigpending through the same code path, and in that case - * the old_sigset_t is smaller in size. - */ -if (arg2 > sizeof(target_sigset_t)) { -return -TARGET_EINVAL; -} - -ret = get_errno(sigpending(&set)); -if (!is_error(ret)) { -if (!(p = lock_user(VERIFY_WRITE, arg1, sizeof(target_sigset_t), 0))) -return -TARGET_EFAULT; -host_to_target_sigset(p, &set); -unlock_user(p, arg1, sizeof(target_sigset_t)); -} -} -return ret; -#ifdef TARGET_NR_sigsuspend -case TARGET_NR_sigsuspend: -{ -TaskState *ts = cpu->opaque; -#if defined(TARGET_ALPHA) -abi_ulong mask = arg1; -target_to_host_old_sigset(&ts->sigsuspend_mask, &mask); -#else -if (!(p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 1))) -
[Qemu-devel] [PATCH 27/33] linux-user: Split out ioctl
At the same time, merge do_ioctl into the new function. Signed-off-by: Richard Henderson --- linux-user/syscall.c | 190 ++- 1 file changed, 97 insertions(+), 93 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index bde1f9872f..4be71367fc 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -5759,97 +5759,6 @@ static IOCTLEntry ioctl_entries[] = { { 0, 0, }, }; -/* ??? Implement proper locking for ioctls. */ -/* do_ioctl() Must return target values and target errnos. */ -static abi_long do_ioctl(int fd, int cmd, abi_long arg) -{ -const IOCTLEntry *ie; -const argtype *arg_type; -abi_long ret; -uint8_t buf_temp[MAX_STRUCT_SIZE]; -int target_size; -void *argptr; - -ie = ioctl_entries; -for(;;) { -if (ie->target_cmd == 0) { -gemu_log("Unsupported ioctl: cmd=0x%04lx\n", (long)cmd); -return -TARGET_ENOSYS; -} -if (ie->target_cmd == cmd) -break; -ie++; -} -arg_type = ie->arg_type; -#if defined(DEBUG) -gemu_log("ioctl: cmd=0x%04lx (%s)\n", (long)cmd, ie->name); -#endif -if (ie->do_ioctl) { -return ie->do_ioctl(ie, buf_temp, fd, cmd, arg); -} else if (!ie->host_cmd) { -/* Some architectures define BSD ioctls in their headers - that are not implemented in Linux. */ -return -TARGET_ENOSYS; -} - -switch(arg_type[0]) { -case TYPE_NULL: -/* no argument */ -ret = get_errno(safe_ioctl(fd, ie->host_cmd)); -break; -case TYPE_PTRVOID: -case TYPE_INT: -ret = get_errno(safe_ioctl(fd, ie->host_cmd, arg)); -break; -case TYPE_PTR: -arg_type++; -target_size = thunk_type_size(arg_type, 0); -switch(ie->access) { -case IOC_R: -ret = get_errno(safe_ioctl(fd, ie->host_cmd, buf_temp)); -if (!is_error(ret)) { -argptr = lock_user(VERIFY_WRITE, arg, target_size, 0); -if (!argptr) -return -TARGET_EFAULT; -thunk_convert(argptr, buf_temp, arg_type, THUNK_TARGET); -unlock_user(argptr, arg, target_size); -} -break; -case IOC_W: -argptr = lock_user(VERIFY_READ, arg, target_size, 1); -if (!argptr) -return -TARGET_EFAULT; -thunk_convert(buf_temp, argptr, arg_type, THUNK_HOST); -unlock_user(argptr, arg, 0); -ret = get_errno(safe_ioctl(fd, ie->host_cmd, buf_temp)); -break; -default: -case IOC_RW: -argptr = lock_user(VERIFY_READ, arg, target_size, 1); -if (!argptr) -return -TARGET_EFAULT; -thunk_convert(buf_temp, argptr, arg_type, THUNK_HOST); -unlock_user(argptr, arg, 0); -ret = get_errno(safe_ioctl(fd, ie->host_cmd, buf_temp)); -if (!is_error(ret)) { -argptr = lock_user(VERIFY_WRITE, arg, target_size, 0); -if (!argptr) -return -TARGET_EFAULT; -thunk_convert(argptr, buf_temp, arg_type, THUNK_TARGET); -unlock_user(argptr, arg, target_size); -} -break; -} -break; -default: -gemu_log("Unsupported ioctl type: cmd=0x%04lx type=%d\n", - (long)cmd, arg_type[0]); -ret = -TARGET_ENOSYS; -break; -} -return ret; -} - static const bitmask_transtbl iflag_tbl[] = { { TARGET_IGNBRK, TARGET_IGNBRK, IGNBRK, IGNBRK }, { TARGET_BRKINT, TARGET_BRKINT, BRKINT, BRKINT }, @@ -8231,6 +8140,102 @@ IMPL(getxpid) } #endif +/* ??? Implement proper locking for ioctls. */ +IMPL(ioctl) +{ +abi_long fd = arg1; +abi_long cmd = arg2; +abi_long arg = arg3; +const IOCTLEntry *ie; +const argtype *arg_type; +abi_long ret; +uint8_t buf_temp[MAX_STRUCT_SIZE]; +int target_size; +void *argptr; + +for (ie = ioctl_entries; ; ie++) { +if (ie->target_cmd == 0) { +gemu_log("Unsupported ioctl: cmd=0x%04lx\n", (long)cmd); +return -TARGET_ENOSYS; +} +if (ie->target_cmd == cmd) { +break; +} +} +arg_type = ie->arg_type; +#if defined(DEBUG) +gemu_log("ioctl: cmd=0x%04lx (%s)\n", (long)cmd, ie->name); +#endif +if (ie->do_ioctl) { +return ie->do_ioctl(ie, buf_temp, fd, cmd, arg); +} else if (!ie->host_cmd) { +/* Some architectures define BSD ioctls in their headers + that are not implemented in Linux. */ +return -TARGET_ENOSYS; +} + +switch (arg_type[0]) { +case TYPE_NULL: +/* no argument */ +ret = get_errno(safe_ioctl(fd, ie->host_cmd)); +break; +case TYPE_PTRVOID: +case TYPE_INT: +ret = get_errno(safe_ioctl(fd, ie->host_cmd, arg)); +
[Qemu-devel] [PATCH 24/33] linux-user: Split out rename, renameat, renameat2
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 113 --- 1 file changed, 63 insertions(+), 50 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 2a172e24eb..24514329b0 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8504,6 +8504,60 @@ IMPL(read) return ret; } +#ifdef TARGET_NR_rename +IMPL(rename) +{ +char *p1 = lock_user_string(arg1); +char *p2 = lock_user_string(arg2); +abi_long ret = -TARGET_EFAULT; + +if (p1 && p2) { +ret = get_errno(rename(p1, p2)); +} +unlock_user(p2, arg2, 0); +unlock_user(p1, arg1, 0); +return ret; +} +#endif + +#if defined(TARGET_NR_renameat) +IMPL(renameat) +{ +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} + +char *p1 = lock_user_string(arg2); +char *p2 = lock_user_string(arg4); +abi_long ret = -TARGET_EFAULT; +if (p1 && p2) { +ret = get_errno(renameat(arg1, p1, arg3, p2)); +} +unlock_user(p2, arg4, 0); +unlock_user(p1, arg2, 0); +return ret; +} +#endif + +#ifdef TARGET_NR_renameat2 +IMPL(renameat2) +{ +if (is_hostfd(arg1)) { +return -TARGET_EBADF; +} + +char *p1 = lock_user_string(arg2); +char *p2 = lock_user_string(arg4); +abi_long ret = -TARGET_EFAULT; +if (p1 && p2) { +ret = get_errno(sys_renameat2(arg1, p1, arg3, p2, arg5)); +} +unlock_user(p2, arg4, 0); +unlock_user(p1, arg2, 0); +return ret; +} +#endif + #ifdef TARGET_NR_stime IMPL(stime) { @@ -8714,56 +8768,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_rename -case TARGET_NR_rename: -{ -void *p2; -p = lock_user_string(arg1); -p2 = lock_user_string(arg2); -if (!p || !p2) -ret = -TARGET_EFAULT; -else -ret = get_errno(rename(p, p2)); -unlock_user(p2, arg2, 0); -unlock_user(p, arg1, 0); -} -return ret; -#endif -#if defined(TARGET_NR_renameat) -case TARGET_NR_renameat: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} else { -void *p2; -p = lock_user_string(arg2); -p2 = lock_user_string(arg4); -if (!p || !p2) -ret = -TARGET_EFAULT; -else -ret = get_errno(renameat(arg1, p, arg3, p2)); -unlock_user(p2, arg4, 0); -unlock_user(p, arg2, 0); -} -return ret; -#endif -#if defined(TARGET_NR_renameat2) -case TARGET_NR_renameat2: -if (is_hostfd(arg1)) { -return -TARGET_EBADF; -} else { -void *p2; -p = lock_user_string(arg2); -p2 = lock_user_string(arg4); -if (!p || !p2) { -ret = -TARGET_EFAULT; -} else { -ret = get_errno(sys_renameat2(arg1, p, arg3, p2, arg5)); -} -unlock_user(p2, arg4, 0); -unlock_user(p, arg2, 0); -} -return ret; -#endif #ifdef TARGET_NR_mkdir case TARGET_NR_mkdir: if (!(p = lock_user_string(arg1))) @@ -12967,6 +12971,15 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_pause] = impl_pause, #endif [TARGET_NR_read] = impl_read, +#ifdef TARGET_NR_rename +[TARGET_NR_rename] = impl_rename, +#endif +#ifdef TARGET_NR_renameat +[TARGET_NR_renameat] = impl_renameat, +#endif +#ifdef TARGET_NR_renameat2 +[TARGET_NR_renameat2] = impl_renameat2, +#endif #ifdef TARGET_NR_stime [TARGET_NR_stime] = impl_stime, #endif -- 2.17.0
[Qemu-devel] [PATCH 22/33] linux-user: Split out alarm, pause, stime, utime, utimes
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 156 ++- 1 file changed, 94 insertions(+), 62 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 53eac58ec0..b3838c5161 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7894,6 +7894,13 @@ IMPL(enosys) return do_unimplemented(num); } +#ifdef TARGET_NR_alarm +IMPL(alarm) +{ +return alarm(arg1); +} +#endif + IMPL(brk) { return do_brk(arg1); @@ -8379,6 +8386,18 @@ IMPL(open_by_handle_at) } #endif +#ifdef TARGET_NR_pause +IMPL(pause) +{ +CPUState *cpu = ENV_GET_CPU(cpu_env); + +if (!block_signals()) { +sigsuspend(&((TaskState *)cpu->opaque)->signal_mask); +} +return -TARGET_EINTR; +} +#endif + IMPL(read) { abi_long ret; @@ -8402,6 +8421,17 @@ IMPL(read) return ret; } +#ifdef TARGET_NR_stime +IMPL(stime) +{ +time_t host_time; +if (get_user_sal(host_time, arg1)) { +return -TARGET_EFAULT; +} +return get_errno(stime(&host_time)); +} +#endif + #ifdef TARGET_NR_time IMPL(time) { @@ -8463,6 +8493,55 @@ IMPL(unlinkat) } #endif +#ifdef TARGET_NR_utime +IMPL(utime) +{ +struct utimbuf tbuf; +char *p; +abi_long ret; + +if (arg2) { +struct target_utimbuf *target_tbuf; +if (!lock_user_struct(VERIFY_READ, target_tbuf, arg2, 1)) { +return -TARGET_EFAULT; +} +tbuf.actime = tswapal(target_tbuf->actime); +tbuf.modtime = tswapal(target_tbuf->modtime); +unlock_user_struct(target_tbuf, arg2, 0); +} +p = lock_user_string(arg1); +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(utime(p, arg2 ? &tbuf : NULL)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + +#ifdef TARGET_NR_utimes +IMPL(utimes) +{ +struct timeval tv[2]; +char *p; +abi_long ret; + +if (arg2 && +(copy_from_user_timeval(&tv[0], arg2) || + copy_from_user_timeval(&tv[1], +arg2 + sizeof(struct target_timeval { +return -TARGET_EFAULT; +} +p = lock_user_string(arg1); +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(utimes(p, arg2 ? tv : NULL)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + #ifdef TARGET_NR_waitid IMPL(waitid) { @@ -8539,68 +8618,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_stime /* not on alpha */ -case TARGET_NR_stime: -{ -time_t host_time; -if (get_user_sal(host_time, arg1)) -return -TARGET_EFAULT; -return get_errno(stime(&host_time)); -} -#endif -#ifdef TARGET_NR_alarm /* not on alpha */ -case TARGET_NR_alarm: -return alarm(arg1); -#endif -#ifdef TARGET_NR_pause /* not on alpha */ -case TARGET_NR_pause: -if (!block_signals()) { -sigsuspend(&((TaskState *)cpu->opaque)->signal_mask); -} -return -TARGET_EINTR; -#endif -#ifdef TARGET_NR_utime -case TARGET_NR_utime: -{ -struct utimbuf tbuf, *host_tbuf; -struct target_utimbuf *target_tbuf; -if (arg2) { -if (!lock_user_struct(VERIFY_READ, target_tbuf, arg2, 1)) -return -TARGET_EFAULT; -tbuf.actime = tswapal(target_tbuf->actime); -tbuf.modtime = tswapal(target_tbuf->modtime); -unlock_user_struct(target_tbuf, arg2, 0); -host_tbuf = &tbuf; -} else { -host_tbuf = NULL; -} -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(utime(p, host_tbuf)); -unlock_user(p, arg1, 0); -} -return ret; -#endif -#ifdef TARGET_NR_utimes -case TARGET_NR_utimes: -{ -struct timeval *tvp, tv[2]; -if (arg2) { -if (copy_from_user_timeval(&tv[0], arg2) -|| copy_from_user_timeval(&tv[1], - arg2 + sizeof(struct target_timeval))) -return -TARGET_EFAULT; -tvp = tv; -} else { -tvp = NULL; -} -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(utimes(p, tvp)); -unlock_user(p, arg1, 0); -} -return ret; -#endif #if defined(TARGET_NR_futimesat) case TARGET_NR_futimesat: if (is_hostfd(arg1)) { @@ -12856,6 +12873,9 @@ IMPL(everything_else) } static impl_fn * const syscall_table[] = { +#ifdef TARGET_NR_alarm +[TARGET_NR_alarm] = impl_alarm, +#endif [TARGET_NR_brk] = impl_brk, [TARGET_NR_close] = impl_close, [TARGET_NR_chdir] = impl_chdir, @@ -12899,8 +12919,14 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_openat
[Qemu-devel] [PATCH 33/33] linux-user: Split out rt_sigqueueinfo, rt_sigtimedwait, rt_tgsigqueueinfo
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 129 ++- 1 file changed, 67 insertions(+), 62 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index c3bd625965..b9e07c2d3f 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8876,6 +8876,20 @@ IMPL(rt_sigprocmask) return ret; } +IMPL(rt_sigqueueinfo) +{ +siginfo_t uinfo; +target_siginfo_t *p; + +p = lock_user(VERIFY_READ, arg3, sizeof(target_siginfo_t), 1); +if (!p) { +return -TARGET_EFAULT; +} +target_to_host_siginfo(&uinfo, p); +unlock_user(p, arg3, 0); +return get_errno(sys_rt_sigqueueinfo(arg1, arg2, &uinfo)); +} + IMPL(rt_sigsuspend) { CPUState *cpu = ENV_GET_CPU(cpu_env); @@ -8899,6 +8913,56 @@ IMPL(rt_sigsuspend) return ret; } +IMPL(rt_sigtimedwait) +{ +sigset_t set; +struct timespec uts, *puts = NULL; +void *p; +siginfo_t uinfo; +abi_long ret; + +if (arg4 != sizeof(target_sigset_t)) { +return -TARGET_EINVAL; +} +p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 1); +if (!p) { +return -TARGET_EFAULT; +} +target_to_host_sigset(&set, p); +unlock_user(p, arg1, 0); +if (arg3) { +puts = &uts; +target_to_host_timespec(puts, arg3); +} +ret = get_errno(safe_rt_sigtimedwait(&set, &uinfo, puts, SIGSET_T_SIZE)); +if (!is_error(ret)) { +if (arg2) { +p = lock_user(VERIFY_WRITE, arg2, sizeof(target_siginfo_t), 0); +if (!p) { +return -TARGET_EFAULT; +} +host_to_target_siginfo(p, &uinfo); +unlock_user(p, arg2, sizeof(target_siginfo_t)); +} +ret = host_to_target_signal(ret); +} +return ret; +} + +IMPL(rt_tgsigqueueinfo) +{ +siginfo_t uinfo; +target_siginfo_t *p; + +p = lock_user(VERIFY_READ, arg4, sizeof(target_siginfo_t), 1); +if (!p) { +return -TARGET_EFAULT; +} +target_to_host_siginfo(&uinfo, p); +unlock_user(p, arg4, 0); +return get_errno(sys_rt_tgsigqueueinfo(arg1, arg2, arg3, &uinfo)); +} + #ifdef TARGET_NR_sgetmask IMPL(sgetmask) { @@ -9393,68 +9457,6 @@ IMPL(everything_else) char *fn; switch(num) { -case TARGET_NR_rt_sigtimedwait: -{ -sigset_t set; -struct timespec uts, *puts; -siginfo_t uinfo; - -if (arg4 != sizeof(target_sigset_t)) { -return -TARGET_EINVAL; -} - -if (!(p = lock_user(VERIFY_READ, arg1, sizeof(target_sigset_t), 1))) -return -TARGET_EFAULT; -target_to_host_sigset(&set, p); -unlock_user(p, arg1, 0); -if (arg3) { -puts = &uts; -target_to_host_timespec(puts, arg3); -} else { -puts = NULL; -} -ret = get_errno(safe_rt_sigtimedwait(&set, &uinfo, puts, - SIGSET_T_SIZE)); -if (!is_error(ret)) { -if (arg2) { -p = lock_user(VERIFY_WRITE, arg2, sizeof(target_siginfo_t), - 0); -if (!p) { -return -TARGET_EFAULT; -} -host_to_target_siginfo(p, &uinfo); -unlock_user(p, arg2, sizeof(target_siginfo_t)); -} -ret = host_to_target_signal(ret); -} -} -return ret; -case TARGET_NR_rt_sigqueueinfo: -{ -siginfo_t uinfo; - -p = lock_user(VERIFY_READ, arg3, sizeof(target_siginfo_t), 1); -if (!p) { -return -TARGET_EFAULT; -} -target_to_host_siginfo(&uinfo, p); -unlock_user(p, arg3, 0); -ret = get_errno(sys_rt_sigqueueinfo(arg1, arg2, &uinfo)); -} -return ret; -case TARGET_NR_rt_tgsigqueueinfo: -{ -siginfo_t uinfo; - -p = lock_user(VERIFY_READ, arg4, sizeof(target_siginfo_t), 1); -if (!p) { -return -TARGET_EFAULT; -} -target_to_host_siginfo(&uinfo, p); -unlock_user(p, arg4, 0); -ret = get_errno(sys_rt_tgsigqueueinfo(arg1, arg2, arg3, &uinfo)); -} -return ret; #ifdef TARGET_NR_sigreturn case TARGET_NR_sigreturn: if (block_signals()) { @@ -13132,7 +13134,10 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_rt_sigaction] = impl_rt_sigaction, [TARGET_NR_rt_sigpending] = impl_rt_sigpending, [TARGET_NR_rt_sigprocmask] = impl_rt_sigprocmask, +[TARGET_NR_rt_sigqueueinfo] = impl_rt_sigqueueinfo, [TARGET_NR_rt_sigsuspend] = impl_rt_sigsuspend, +[TARGET_NR_rt_sigtimedwait] = impl_rt_sigtimedwait, +[TARGET_NR_rt_tgsigqueueinfo] = impl_rt_tgsigqueuein
[Qemu-devel] [PATCH 30/33] linux-user: Split out rt_sigaction, sigaction
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 325 ++- 1 file changed, 165 insertions(+), 160 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 3dfb77ac11..36e2bb838e 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8730,6 +8730,81 @@ IMPL(rmdir) } #endif +IMPL(rt_sigaction) +{ +abi_long ret; +#ifdef TARGET_ALPHA +/* For Alpha and SPARC this is a 5 argument syscall, with + * a 'restorer' parameter which must be copied into the + * sa_restorer field of the sigaction struct. + * For Alpha that 'restorer' is arg5; for SPARC it is arg4, + * and arg5 is the sigsetsize. + * Alpha also has a separate rt_sigaction struct that it uses + * here; SPARC uses the usual sigaction struct. + */ +struct target_rt_sigaction *rt_act; +struct target_sigaction act, oact, *pact = 0; + +if (arg4 != sizeof(target_sigset_t)) { +return -TARGET_EINVAL; +} +if (arg2) { +if (!lock_user_struct(VERIFY_READ, rt_act, arg2, 1)) { +return -TARGET_EFAULT; +} +act._sa_handler = rt_act->_sa_handler; +act.sa_mask = rt_act->sa_mask; +act.sa_flags = rt_act->sa_flags; +act.sa_restorer = arg5; +unlock_user_struct(rt_act, arg2, 0); +pact = &act; +} +ret = get_errno(do_sigaction(arg1, pact, &oact)); +if (!is_error(ret) && arg3) { +if (!lock_user_struct(VERIFY_WRITE, rt_act, arg3, 0)) { +return -TARGET_EFAULT; +} +rt_act->_sa_handler = oact._sa_handler; +rt_act->sa_mask = oact.sa_mask; +rt_act->sa_flags = oact.sa_flags; +unlock_user_struct(rt_act, arg3, 1); +} +#else +# ifdef TARGET_SPARC +target_ulong restorer = arg4; +target_ulong sigsetsize = arg5; +# else +target_ulong sigsetsize = arg4; +# endif +struct target_sigaction *act = NULL; +struct target_sigaction *oact = NULL; + +if (sigsetsize != sizeof(target_sigset_t)) { +return -TARGET_EINVAL; +} +if (arg2) { +if (!lock_user_struct(VERIFY_READ, act, arg2, 1)) { +return -TARGET_EFAULT; +} +# ifdef TARGET_ARCH_HAS_KA_RESTORER +act->ka_restorer = restorer; +# endif +} +if (arg3 && !lock_user_struct(VERIFY_WRITE, oact, arg3, 0)) { +ret = -TARGET_EFAULT; +} else { +ret = get_errno(do_sigaction(arg1, act, oact)); +} +if (act) { +unlock_user_struct(act, arg2, 0); +} +if (oact) { +unlock_user_struct(oact, arg3, 1); +} +#endif +return ret; +} + IMPL(setpgid) { return get_errno(setpgid(arg1, arg2)); @@ -8740,6 +8815,92 @@ IMPL(setsid) return get_errno(setsid()); } +#ifdef TARGET_NR_sigaction +IMPL(sigaction) +{ +abi_long ret; +# if defined(TARGET_ALPHA) +struct target_sigaction act, oact, *pact = NULL; +struct target_old_sigaction *old_act; +if (arg2) { +if (!lock_user_struct(VERIFY_READ, old_act, arg2, 1)) { +return -TARGET_EFAULT; +} +act._sa_handler = old_act->_sa_handler; +target_siginitset(&act.sa_mask, old_act->sa_mask); +act.sa_flags = old_act->sa_flags; +act.sa_restorer = 0; +unlock_user_struct(old_act, arg2, 0); +pact = &act; +} +ret = get_errno(do_sigaction(arg1, pact, &oact)); +if (!is_error(ret) && arg3) { +if (!lock_user_struct(VERIFY_WRITE, old_act, arg3, 0)) { +return -TARGET_EFAULT; +} +old_act->_sa_handler = oact._sa_handler; +old_act->sa_mask = oact.sa_mask.sig[0]; +old_act->sa_flags = oact.sa_flags; +unlock_user_struct(old_act, arg3, 1); +} +# elif defined(TARGET_MIPS) +struct target_sigaction act, oact, *pact = NULL, *old_act; +if (arg2) { +if (!lock_user_struct(VERIFY_READ, old_act, arg2, 1)) { +return -TARGET_EFAULT; +} + act._sa_handler = old_act->_sa_handler; + target_siginitset(&act.sa_mask, old_act->sa_mask.sig[0]); + act.sa_flags = old_act->sa_flags; + unlock_user_struct(old_act, arg2, 0); + pact = &act; +} +ret = get_errno(do_sigaction(arg1, pact, &oact)); +if (!is_error(ret) && arg3) { +if (!lock_user_struct(VERIFY_WRITE, old_act, arg3, 0)) { +return -TARGET_EFAULT; +} + old_act->_sa_handler = oact._sa_handler; + old_act->sa_flags = oact.sa_flags; + old_act->sa_mask.sig[0] = oact.sa_mask.sig[0]; + old_act->sa_mask.sig[1] = 0; + old_act->sa_mask.sig[2] = 0; + old_act->sa_mask.sig[3] = 0; + unlock_user_struct(old_act, arg3, 1); +} +# else +struct target_sigaction act, oact, *pact = NULL; +struct target_old_sigaction *old_act; +if (arg2) { +if (!lock_user_struct(VERIFY_READ, old_act, arg2, 1)) { +return -TARGET_EFAULT; +} +act._sa_handler = old_act->_sa
[Qemu-devel] [PATCH 26/33] linux-user: Split out acct, pipe, pipe2, times, umount2
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 127 +++ 1 file changed, 80 insertions(+), 47 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 36092d753d..bde1f9872f 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7912,6 +7912,24 @@ IMPL(access) } #endif +IMPL(acct) +{ +if (arg1 == 0) { +return get_errno(acct(NULL)); +} else { +char *fn = lock_user_string(arg1); +abi_long ret; + +if (!fn) { +return -TARGET_EFAULT; +} +TRY_INTERP_PATH(ret, fn, acct(fn)); +ret = get_errno(ret); +unlock_user(fn, arg1, 0); +return ret; +} +} + #ifdef TARGET_NR_alarm IMPL(alarm) { @@ -8529,6 +8547,21 @@ IMPL(pause) } #endif +#ifdef TARGET_NR_pipe +IMPL(pipe) +{ +return do_pipe(cpu_env, arg1, 0, 0); +} +#endif + +#ifdef TARGET_NR_pipe2 +IMPL(pipe2) +{ +return do_pipe(cpu_env, arg1, + target_to_host_bitmask(arg2, fcntl_flags_tbl), 1); +} +#endif + IMPL(read) { abi_long ret; @@ -8657,6 +8690,27 @@ IMPL(time) } #endif +IMPL(times) +{ +struct tms tms; +abi_long ret = get_errno(times(&tms)); +if (arg1) { +struct target_tms *tmsp += lock_user(VERIFY_WRITE, arg1, sizeof(struct target_tms), 0); +if (!tmsp) { +return -TARGET_EFAULT; +} +tmsp->tms_utime = tswapal(host_to_target_clock_t(tms.tms_utime)); +tmsp->tms_stime = tswapal(host_to_target_clock_t(tms.tms_stime)); +tmsp->tms_cutime = tswapal(host_to_target_clock_t(tms.tms_cutime)); +tmsp->tms_cstime = tswapal(host_to_target_clock_t(tms.tms_cstime)); +} +if (!is_error(ret)) { +ret = host_to_target_clock_t(ret); +} +return ret; +} + #ifdef TARGET_NR_umount IMPL(umount) { @@ -8672,6 +8726,21 @@ IMPL(umount) } #endif +#ifdef TARGET_NR_umount2 +IMPL(umount2) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(umount2(p, arg2)); +unlock_user(p, arg1, 0); +return ret; +} +#endif + #ifdef TARGET_NR_unlink IMPL(unlink) { @@ -8831,53 +8900,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_pipe -case TARGET_NR_pipe: -return do_pipe(cpu_env, arg1, 0, 0); -#endif -#ifdef TARGET_NR_pipe2 -case TARGET_NR_pipe2: -return do_pipe(cpu_env, arg1, - target_to_host_bitmask(arg2, fcntl_flags_tbl), 1); -#endif -case TARGET_NR_times: -{ -struct target_tms *tmsp; -struct tms tms; -ret = get_errno(times(&tms)); -if (arg1) { -tmsp = lock_user(VERIFY_WRITE, arg1, sizeof(struct target_tms), 0); -if (!tmsp) -return -TARGET_EFAULT; -tmsp->tms_utime = tswapal(host_to_target_clock_t(tms.tms_utime)); -tmsp->tms_stime = tswapal(host_to_target_clock_t(tms.tms_stime)); -tmsp->tms_cutime = tswapal(host_to_target_clock_t(tms.tms_cutime)); -tmsp->tms_cstime = tswapal(host_to_target_clock_t(tms.tms_cstime)); -} -if (!is_error(ret)) -ret = host_to_target_clock_t(ret); -} -return ret; -case TARGET_NR_acct: -if (arg1 == 0) { -ret = get_errno(acct(NULL)); -} else { -if (!(fn = lock_user_string(arg1))) { -return -TARGET_EFAULT; -} -TRY_INTERP_PATH(ret, fn, acct(fn)); -ret = get_errno(ret); -unlock_user(fn, arg1, 0); -} -return ret; -#ifdef TARGET_NR_umount2 -case TARGET_NR_umount2: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(umount2(p, arg2)); -unlock_user(p, arg1, 0); -return ret; -#endif case TARGET_NR_ioctl: return do_ioctl(arg1, arg2, arg3); #ifdef TARGET_NR_fcntl @@ -12937,6 +12959,7 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_access [TARGET_NR_access] = impl_access, #endif +[TARGET_NR_acct] = impl_acct, #ifdef TARGET_NR_alarm [TARGET_NR_alarm] = impl_alarm, #endif @@ -13003,6 +13026,12 @@ static impl_fn * const syscall_table[] = { #endif #ifdef TARGET_NR_pause [TARGET_NR_pause] = impl_pause, +#endif +#ifdef TARGET_NR_pipe +[TARGET_NR_pipe] = impl_pipe, +#endif +#ifdef TARGET_NR_pipe2 +[TARGET_NR_pipe2] = impl_pipe2, #endif [TARGET_NR_read] = impl_read, #ifdef TARGET_NR_rename @@ -13027,9 +13056,13 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_time [TARGET_NR_time] = impl_time, #endif +[TARGET_NR_times] = impl_times, #ifdef TARGET_NR_umount [TARGET_NR_umount] = impl_umount, #endif +#ifdef TARGET_NR_umount2 +[TARGET_NR_umount2] = impl_umount2, +#endif
Re: [Qemu-devel] [PATCH 00/33] linux-user: Begin splitting do_syscall
On 06/01/2018 12:30 AM, Richard Henderson wrote: > This function is, as I think everyone will agree, way too large. > This is about a third of the complete change, but I thought I'd > get some feedback on the method and form before I go any farther. Bah. I also meant to say Based-on: 20180531224911.23725-1-richard.hender...@linaro.org that is, the interp_prefix patch set from earlier today. r~
[Qemu-devel] [PATCH 28/33] linux-user: Split out chroot, dup2, dup3, fcntl, setpgid, umask
Signed-off-by: Richard Henderson --- linux-user/syscall.c | 123 +++ 1 file changed, 79 insertions(+), 44 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 4be71367fc..4d9b9cad6e 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7879,6 +7879,19 @@ IMPL(chmod) } #endif +IMPL(chroot) +{ +char *p = lock_user_string(arg1); +abi_long ret; + +if (!p) { +return -TARGET_EFAULT; +} +ret = get_errno(chroot(p)); +unlock_user(p, arg1, 0); +return ret; +} + IMPL(close) { if (is_hostfd(arg1)) { @@ -7918,6 +7931,43 @@ IMPL(dup) return ret; } +#ifdef TARGET_NR_dup2 +IMPL(dup2) +{ +abi_long ret; + +if (is_hostfd(arg1) || is_hostfd(arg2)) { +return -TARGET_EBADF; +} +ret = get_errno(dup2(arg1, arg2)); +if (ret >= 0) { +fd_trans_dup(arg1, arg2); +} +return ret; +} +#endif + +#if defined(TARGET_NR_dup3) && defined(CONFIG_DUP3) +IMPL(dup3) +{ +int host_flags; +abi_long ret; + +if (is_hostfd(arg1) || is_hostfd(arg2)) { +return -TARGET_EBADF; +} +if ((arg3 & ~TARGET_O_CLOEXEC) != 0) { +return -EINVAL; +} +host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); +ret = get_errno(dup3(arg1, arg2, host_flags)); +if (ret >= 0) { +fd_trans_dup(arg1, arg2); +} +return ret; +} +#endif + IMPL(execve) { abi_ulong *guest_ptrs; @@ -8087,6 +8137,13 @@ IMPL(faccessat) } #endif +#ifdef TARGET_NR_fcntl +IMPL(fcntl) +{ +return do_fcntl(arg1, arg2, arg3); +} +#endif + #ifdef TARGET_NR_fork IMPL(fork) { @@ -8659,6 +8716,11 @@ IMPL(rmdir) } #endif +IMPL(setpgid) +{ +return get_errno(setpgid(arg1, arg2)); +} + #ifdef TARGET_NR_stime IMPL(stime) { @@ -8716,6 +8778,11 @@ IMPL(times) return ret; } +IMPL(umask) +{ +return get_errno(umask(arg1)); +} + #ifdef TARGET_NR_umount IMPL(umount) { @@ -8905,50 +8972,6 @@ IMPL(everything_else) char *fn; switch(num) { -#ifdef TARGET_NR_fcntl -case TARGET_NR_fcntl: -return do_fcntl(arg1, arg2, arg3); -#endif -case TARGET_NR_setpgid: -return get_errno(setpgid(arg1, arg2)); -case TARGET_NR_umask: -return get_errno(umask(arg1)); -case TARGET_NR_chroot: -if (!(p = lock_user_string(arg1))) -return -TARGET_EFAULT; -ret = get_errno(chroot(p)); -unlock_user(p, arg1, 0); -return ret; -#ifdef TARGET_NR_dup2 -case TARGET_NR_dup2: -if (is_hostfd(arg1) || is_hostfd(arg2)) { -return -TARGET_EBADF; -} -ret = get_errno(dup2(arg1, arg2)); -if (ret >= 0) { -fd_trans_dup(arg1, arg2); -} -return ret; -#endif -#if defined(CONFIG_DUP3) && defined(TARGET_NR_dup3) -case TARGET_NR_dup3: -{ -int host_flags; - -if (is_hostfd(arg1) || is_hostfd(arg2)) { -return -TARGET_EBADF; -} -if ((arg3 & ~TARGET_O_CLOEXEC) != 0) { -return -EINVAL; -} -host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); -ret = get_errno(dup3(arg1, arg2, host_flags)); -if (ret >= 0) { -fd_trans_dup(arg1, arg2); -} -return ret; -} -#endif #ifdef TARGET_NR_getppid /* not on alpha */ case TARGET_NR_getppid: return get_errno(getppid()); @@ -12969,6 +12992,7 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_brk] = impl_brk, [TARGET_NR_close] = impl_close, [TARGET_NR_chdir] = impl_chdir, +[TARGET_NR_chroot] = impl_chroot, #ifdef TARGET_NR_chmod [TARGET_NR_chmod] = impl_chmod, #endif @@ -12976,11 +13000,20 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_creat] = impl_creat, #endif [TARGET_NR_dup] = impl_dup, +#ifdef TARGET_NR_dup2 +[TARGET_NR_dup2] = impl_dup2, +#endif +#if defined(TARGET_NR_dup3) && defined(CONFIG_DUP3) +[TARGET_NR_dup3] = impl_dup3, +#endif [TARGET_NR_execve] = impl_execve, [TARGET_NR_exit] = impl_exit, #ifdef TARGET_NR_faccessat [TARGET_NR_faccessat] = impl_faccessat, #endif +#ifdef TARGET_NR_fcntl +[TARGET_NR_fcntl] = impl_fcntl, +#endif #ifdef TARGET_NR_fork [TARGET_NR_fork] = impl_fork, #endif @@ -13050,6 +13083,7 @@ static impl_fn * const syscall_table[] = { #ifdef TARGET_NR_rmdir [TARGET_NR_rmdir] = impl_rmdir, #endif +[TARGET_NR_setpgid] = impl_setpgid, #ifdef TARGET_NR_stime [TARGET_NR_stime] = impl_stime, #endif @@ -13061,6 +13095,7 @@ static impl_fn * const syscall_table[] = { [TARGET_NR_time] = impl_time, #endif [TARGET_NR_times] = impl_times, +[TARGET_NR_umask] = impl_umask, #ifdef TARGET_NR_umount [TARGET_NR_umount] = impl_umount, #endif -- 2.17.0
Re: [Qemu-devel] [PATCH 01/33] linux-user: Split out do_syscall1
Le 01/06/2018 à 09:30, Richard Henderson a écrit : > There was supposed to be a single point of return for do_syscall > so that tracing works properly. However, there are a few bugs > in that area. It is significantly simpler to simply split out > an inner function to enforce this. > > Signed-off-by: Richard Henderson > --- > linux-user/syscall.c | 89 +++- > 1 file changed, 54 insertions(+), 35 deletions(-) Reviewed-by: Laurent Vivier
Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
On 06/01/2018 12:00 PM, Peter Xu wrote: On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote: This patch adds an API to clear bits corresponding to guest free pages from the dirty bitmap. Spilt the free page block if it crosses the QEMU RAMBlock boundary. Signed-off-by: Wei Wang CC: Dr. David Alan Gilbert CC: Juan Quintela CC: Michael S. Tsirkin --- include/migration/misc.h | 2 ++ migration/ram.c | 44 2 files changed, 46 insertions(+) diff --git a/include/migration/misc.h b/include/migration/misc.h index 4ebf24c..113320e 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -14,11 +14,13 @@ #ifndef MIGRATION_MISC_H #define MIGRATION_MISC_H +#include "exec/cpu-common.h" #include "qemu/notify.h" /* migration/ram.c */ void ram_mig_init(void); +void qemu_guest_free_page_hint(void *addr, size_t len); /* migration/block.c */ diff --git a/migration/ram.c b/migration/ram.c index 9a72b1a..0147548 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp) } /* + * This function clears bits of the free pages reported by the caller from the + * migration dirty bitmap. @addr is the host address corresponding to the + * start of the continuous guest free pages, and @len is the total bytes of + * those pages. + */ +void qemu_guest_free_page_hint(void *addr, size_t len) +{ +RAMBlock *block; +ram_addr_t offset; +size_t used_len, start, npages; Do we need to check here on whether a migration is in progress? Since if not I'm not sure whether this hint still makes any sense any more, and more importantly it seems to me that block->bmap below at [1] is only valid during a migration. So I'm not sure whether QEMU will crash if this function is called without a running migration. OK. How about just adding comments above to have users noted that this function should be used during migration? If we want to do a sanity check here, I think it would be easier to just check !block->bmap here. + +for (; len > 0; len -= used_len) { +block = qemu_ram_block_from_host(addr, false, &offset); +if (unlikely(!block)) { +return; We should never reach here, should we? Assuming the callers of this function should always pass in a correct host address. If we are very sure that the host addr should be valid, could we just assert? Probably not the case, because of the corner case that the memory would be hot unplugged after the free page is reported to QEMU. +} + +/* + * This handles the case that the RAMBlock is resized after the free + * page hint is reported. + */ +if (unlikely(offset > block->used_length)) { +return; +} + +if (len <= block->used_length - offset) { +used_len = len; +} else { +used_len = block->used_length - offset; +addr += used_len; +} + +start = offset >> TARGET_PAGE_BITS; +npages = used_len >> TARGET_PAGE_BITS; + +qemu_mutex_lock(&ram_state->bitmap_mutex); So now I think I understand the lock can still be meaningful since this function now can be called outside the migration thread (e.g., in vcpu thread). But still it would be nice to mention it somewhere on the truth of the lock. Yes. Thanks for the reminder. I will add some explanation to the patch 2 commit log. Best, Wei
Re: [Qemu-devel] About cpu_physical_memory_map()
Hi Peter, Thank you a lot for the analysis! So it'll be simpler > if you start with the buffer in the host QEMU process, map this > in to the guest's physical address space at some GPA, tell the > guest kernel that that's the GPA to use, and have the guest kernel > map that GPA into the guest userspace process's virtual address space. > (Think of how you would map a framebuffer, for instance.) This makes sense to me. Could you help provide a pointer where I can refer to similar implementations? Should I do something like this during system memory initialization: memory_region_init_ram_ptr(my_mr, owner, "mybuf", buf_size, buf); // where buf is the buffer in QEMU AS memory_region_add_subregion(system_memory, GPA_OFFSET, my_mr); If I set guest memory to be "-m 1G", can I make "GPA_OFFSET" beyond 1GB (e.g. 2GB)? This way, the guest OS won't be able to access my buffer and use it like other regular RAM. Thanks! Best, Huaicheng On Thu, May 31, 2018 at 3:11 AM Peter Maydell wrote: > On 30 May 2018 at 01:24, Huaicheng Li wrote: > > Dear QEMU/KVM developers, > > > > I was trying to map a buffer in host QEMU process to a guest user space > > application. I tried to achieve this > > by allocating a buffer in the guest application first, then map this > buffer > > to QEMU process address space via > > GVA -> GPA --> HVA (GPA to HVA is done via cpu_physical_memory_map). > Last, > > I wrote a host kernel driver to > > walk QEMU process's page table and change corresponding page table > entries > > of HVA to the HPA of the target > > buffer. > > This seems like the wrong way round to try to do this. As a rule > of thumb, you'll have an easier life if you have things behave > similarly to how they would in real hardware. So it'll be simpler > if you start with the buffer in the host QEMU process, map this > in to the guest's physical address space at some GPA, tell the > guest kernel that that's the GPA to use, and have the guest kernel > map that GPA into the guest userspace process's virtual address space. > (Think of how you would map a framebuffer, for instance.) > > Changing the host page table entries for QEMU under its feet seems > like it's never going to work reliably. > > (I think the specific problem you're running into is that guest memory > is both mapped into the QEMU host process and also exposed to the > guest VM. The former is controlled by the page tables for the > QEMU host process, but the latter is a different set of page tables, > which QEMU asks the kernel to configure, using KVM_SET_USER_MEMORY_REGION > ioctls.) > > thanks > -- PMM >
[Qemu-devel] [PATCH V6 1/7] memory, exec: Expose all memory block related flags.
From: Junyan He We need to use these flags in other files rather than just in exec.c, For example, RAM_SHARED should be used when create a ram block from file. We expose them the exec/memory.h Signed-off-by: Junyan He --- exec.c| 17 - include/exec/memory.h | 17 + 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/exec.c b/exec.c index c30f905..302c04b 100644 --- a/exec.c +++ b/exec.c @@ -87,23 +87,6 @@ AddressSpace address_space_memory; MemoryRegion io_mem_rom, io_mem_notdirty; static MemoryRegion io_mem_unassigned; - -/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ -#define RAM_PREALLOC (1 << 0) - -/* RAM is mmap-ed with MAP_SHARED */ -#define RAM_SHARED (1 << 1) - -/* Only a portion of RAM (used_length) is actually used, and migrated. - * This used_length size can change across reboots. - */ -#define RAM_RESIZEABLE (1 << 2) - -/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically - * zero the page and wake waiting processes. - * (Set during postcopy) - */ -#define RAM_UF_ZEROPAGE (1 << 3) #endif #ifdef TARGET_PAGE_BITS_VARY diff --git a/include/exec/memory.h b/include/exec/memory.h index 67ea7fe..3da315e 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -102,6 +102,23 @@ struct IOMMUNotifier { }; typedef struct IOMMUNotifier IOMMUNotifier; +/* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */ +#define RAM_PREALLOC (1 << 0) + +/* RAM is mmap-ed with MAP_SHARED */ +#define RAM_SHARED (1 << 1) + +/* Only a portion of RAM (used_length) is actually used, and migrated. + * This used_length size can change across reboots. + */ +#define RAM_RESIZEABLE (1 << 2) + +/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically + * zero the page and wake waiting processes. + * (Set during postcopy) + */ +#define RAM_UF_ZEROPAGE (1 << 3) + static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn, IOMMUNotifierFlag flags, hwaddr start, hwaddr end) -- 2.7.4
[Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory
From: Junyan He QEMU writes to vNVDIMM backends in the vNVDIMM label emulation and live migration. If the backend is on the persistent memory, QEMU needs to take proper operations to ensure its writes persistent on the persistent memory. Otherwise, a host power failure may result in the loss the guest data on the persistent memory. This v3 patch series is based on Marcel's patch "mem: add share parameter to memory-backend-ram" [1] because of the changes in patch 1. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03858.html Previous versions can be found at v5: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg02258.html V4: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06993.html v3: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04365.html v2: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01579.html v1: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05040.html Changes in v6: * (Patch 1) Expose all ram block flags rather than redefine the flags. * (Patch 4) Use pkg-config rather the hard check when configure. * (Patch 7) Sync and flush all the pmem data when migration completes, rather than sync pages one by one in previous version. Changes in v5: * (Patch 9) Add post copy check and output some messages for nvdimm. Changes in v4: * (Patch 2) Fix compilation errors found by patchew. Changes in v3: * (Patch 5) Add a is_pmem flag to ram_handle_compressed() and handle PMEM writes in it, so we don't need the _common function. * (Patch 6) Expose qemu_get_buffer_common so we can remove the unnecessary qemu_get_buffer_to_pmem wrapper. * (Patch 8) Add a is_pmem flag to xbzrle_decode_buffer() and handle PMEM writes in it, so we can remove the unnecessary xbzrle_decode_buffer_{common, to_pmem}. * Move libpmem stubs to stubs/pmem.c and fix the compilation failures of test-{xbzrle,vmstate}.c. Changes in v2: * (Patch 1) Use a flags parameter in file ram allocation functions. * (Patch 2) Add a new option 'pmem' to hostmem-file. * (Patch 3) Use libpmem to operate on the persistent memory, rather than re-implementing those operations in QEMU. * (Patch 5-8) Consider the write persistence in the migration path. Junyan: [1/7] memory, exec: Expose all memory block related flags. [6/7] migration/ram: Add check and info message to nvdimm post copy. [7/7] migration/ram: ensure write persistence on loading all date to PMEM. Haozhong: [5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation Haozhong & Junyan: [2/7] memory, exec: switch file ram allocation functions to 'flags' parameters [3/7] hostmem-file: add the 'pmem' option [4/7] configure: add libpmem support Signed-off-by: Haozhong Zhang Signed-off-by: Junyan He --- backends/hostmem-file.c | 28 +++- configure | 29 + docs/nvdimm.txt | 14 ++ exec.c | 36 ++-- hw/mem/nvdimm.c | 9 - include/exec/memory.h | 31 +-- include/exec/ram_addr.h | 28 ++-- include/qemu/pmem.h | 24 memory.c| 8 +--- migration/ram.c | 18 ++ numa.c | 2 +- qemu-options.hx | 7 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 23 +++ 14 files changed, 226 insertions(+), 32 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH V6 2/7] memory, exec: switch file ram allocation functions to 'flags' parameters
From: Junyan He As more flag parameters besides the existing 'share' are going to be added to following functions memory_region_init_ram_from_file qemu_ram_alloc_from_fd qemu_ram_alloc_from_file let's switch them to use the 'flags' parameters so as to ease future flag additions. The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags, and other flag bits are ignored by above functions right now. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang --- backends/hostmem-file.c | 3 ++- exec.c | 10 +- include/exec/memory.h | 8 ++-- include/exec/ram_addr.h | 25 +++-- memory.c| 8 +--- numa.c | 2 +- 6 files changed, 42 insertions(+), 14 deletions(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 134b08d..34c68bb 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -58,7 +58,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path(OBJECT(backend)); memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, - backend->size, fb->align, backend->share, + backend->size, fb->align, + backend->share ? RAM_SHARED : 0, fb->mem_path, errp); g_free(path); } diff --git a/exec.c b/exec.c index 302c04b..f2082fa 100644 --- a/exec.c +++ b/exec.c @@ -2054,7 +2054,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) #ifdef __linux__ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, - bool share, int fd, + uint64_t ram_flags, int fd, Error **errp) { RAMBlock *new_block; @@ -2096,14 +2096,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->mr = mr; new_block->used_length = size; new_block->max_length = size; -new_block->flags = share ? RAM_SHARED : 0; +new_block->flags = ram_flags; new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp); if (!new_block->host) { g_free(new_block); return NULL; } -ram_block_add(new_block, &local_err, share); +ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED); if (local_err) { g_free(new_block); error_propagate(errp, local_err); @@ -2115,7 +2115,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, - bool share, const char *mem_path, + uint64_t ram_flags, const char *mem_path, Error **errp) { int fd; @@ -2127,7 +2127,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } -block = qemu_ram_alloc_from_fd(size, mr, share, fd, errp); +block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp); if (!block) { if (created) { unlink(mem_path); diff --git a/include/exec/memory.h b/include/exec/memory.h index 3da315e..3b68a43 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -596,6 +596,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, void *host), Error **errp); #ifdef __linux__ + /** * memory_region_init_ram_from_file: Initialize RAM memory region with a *mmap-ed backend. @@ -607,7 +608,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @size: size of the region. * @align: alignment of the region base address; if 0, the default alignment * (getpagesize()) will be used. - * @share: %true if memory must be mmaped with the MAP_SHARED flag + * @ram_flags: specify properties of this memory region, which can be one or + * bit-or of following values: + * - RAM_SHARED: memory must be mmaped with the MAP_SHARED flag + * Other bits are ignored. * @path: the path in which to allocate the RAM. * @errp: pointer to Error*, to store an error if it happens. * @@ -619,7 +623,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, const char *name, uint64_t size, uint64_t align, - bool share, + uint64_t ram_flags, const char *path, Error **errp); diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index cf24
[Qemu-devel] [PATCH V6 4/7] configure: add libpmem support
From: Junyan He Add a pair of configure options --{enable,disable}-libpmem to control whether QEMU is compiled with PMDK libpmem [1]. QEMU may write to the host persistent memory (e.g. in vNVDIMM label emulation and live migration), so it must take the proper operations to ensure the persistence of its own writes. Depending on the CPU models and available instructions, the optimal operation can vary [2]. PMDK libpmem have already implemented those operations on multiple CPU models (x86 and ARM) and the logic to select the optimal ones, so QEMU can just use libpmem rather than re-implement them. [1] PMDK (formerly known as NMVL), https://github.com/pmem/pmdk/ [2] https://github.com/pmem/pmdk/blob/38bfa652721a37fd94c0130ce0e3f5d8baa3ed40/src/libpmem/pmem.c#L33 Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index a6a4616..f44d669 100755 --- a/configure +++ b/configure @@ -456,6 +456,7 @@ jemalloc="no" replication="yes" vxhs="" libxml2="" +libpmem="" supported_cpu="no" supported_os="no" @@ -1381,6 +1382,10 @@ for opt do ;; --disable-git-update) git_update=no ;; + --enable-libpmem) libpmem=yes + ;; + --disable-libpmem) libpmem=no + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1638,6 +1643,7 @@ disabled with --disable-FEATURE, default is enabled if available: crypto-afalgLinux AF_ALG crypto backend driver vhost-user vhost-user support capstonecapstone disassembler support + libpmem libpmem support NOTE: The object files are built at the place where configure is launched EOF @@ -5445,6 +5451,24 @@ EOF fi ## +# check for libpmem + +if test "$libpmem" != "no"; then + if $pkg_config --exists "libpmem"; then + libpmem="yes" + libpmem_libs=$($pkg_config --libs libpmem) + libpmem_cflags=$($pkg_config --cflags libpmem) + libs_softmmu="$libs_softmmu $libpmem_libs" + QEMU_CFLAGS="$QEMU_CFLAGS $libpmem_cflags" + else + if test "$libpmem" = "yes" ; then + feature_not_found "libpmem" "Install nvml or pmdk" + fi + libpmem="no" + fi +fi + +## # End of CC checks # After here, no more $cc or $ld runs @@ -5907,6 +5931,7 @@ echo "avx2 optimization $avx2_opt" echo "replication support $replication" echo "VxHS block device $vxhs" echo "capstone $capstone" +echo "libpmem support $libpmem" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -6651,6 +6676,10 @@ if test "$vxhs" = "yes" ; then echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak fi +if test "$libpmem" = "yes" ; then + echo "CONFIG_LIBPMEM=y" >> $config_host_mak +fi + if test "$tcg_interpreter" = "yes"; then QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES" elif test "$ARCH" = "sparc64" ; then -- 2.7.4
[Qemu-devel] [PATCH V6 5/7] mem/nvdimm: ensure write persistence to PMEM in label emulation
From: Junyan He Guest writes to vNVDIMM labels are intercepted and performed on the backend by QEMU. When the backend is a real persistent memort, QEMU needs to take proper operations to ensure its write persistence on the persistent memory. Otherwise, a host power failure may result in the loss of guest label configurations. Signed-off-by: Haozhong Zhang --- hw/mem/nvdimm.c | 9 - include/qemu/pmem.h | 23 +++ stubs/Makefile.objs | 1 + stubs/pmem.c| 19 +++ 4 files changed, 51 insertions(+), 1 deletion(-) create mode 100644 include/qemu/pmem.h create mode 100644 stubs/pmem.c diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 4087aca..03b478e 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "qemu/pmem.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "hw/mem/nvdimm.h" @@ -155,11 +156,17 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf, { MemoryRegion *mr; PCDIMMDevice *dimm = PC_DIMM(nvdimm); +bool is_pmem = object_property_get_bool(OBJECT(dimm->hostmem), +"pmem", NULL); uint64_t backend_offset; nvdimm_validate_rw_label_data(nvdimm, size, offset); -memcpy(nvdimm->label_data + offset, buf, size); +if (!is_pmem) { +memcpy(nvdimm->label_data + offset, buf, size); +} else { +pmem_memcpy_persist(nvdimm->label_data + offset, buf, size); +} mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort); backend_offset = memory_region_size(mr) - nvdimm->label_size + offset; diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h new file mode 100644 index 000..00d6680 --- /dev/null +++ b/include/qemu/pmem.h @@ -0,0 +1,23 @@ +/* + * QEMU header file for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_PMEM_H +#define QEMU_PMEM_H + +#ifdef CONFIG_LIBPMEM +#include +#else /* !CONFIG_LIBPMEM */ + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); + +#endif /* CONFIG_LIBPMEM */ + +#endif /* !QEMU_PMEM_H */ diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 53d3f32..be9a042 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -43,3 +43,4 @@ stub-obj-y += xen-common.o stub-obj-y += xen-hvm.o stub-obj-y += pci-host-piix.o stub-obj-y += ram-block.o +stub-obj-$(call lnot,$(CONFIG_LIBPMEM)) += pmem.o \ No newline at end of file diff --git a/stubs/pmem.c b/stubs/pmem.c new file mode 100644 index 000..b4ec72d --- /dev/null +++ b/stubs/pmem.c @@ -0,0 +1,19 @@ +/* + * Stubs for libpmem. + * + * Copyright (c) 2018 Intel Corporation. + * + * Author: Haozhong Zhang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/pmem.h" + +void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) +{ +return memcpy(pmemdest, src, len); +} -- 2.7.4
[Qemu-devel] [PATCH V6 6/7] migration/ram: Add check and info message to nvdimm post copy.
From: Junyan He The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He --- migration/ram.c | 9 + 1 file changed, 9 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index c53e836..aa0c6f0 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3397,6 +3397,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) static bool ram_has_postcopy(void *opaque) { +RAMBlock *rb; +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +info_report("Block: %s, host: %p is a nvdimm memory, postcopy" + "is not supported now!", rb->idstr, rb->host); +return false; +} +} + return migrate_postcopy_ram(); } -- 2.7.4
[Qemu-devel] [PATCH V6 3/7] hostmem-file: add the 'pmem' option
From: Junyan He When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it needs to know whether the backend storage is a real persistent memory, in order to decide whether special operations should be performed to ensure the data persistence. This boolean option 'pmem' allows users to specify whether the backend storage of memory-backend-file is a real persistent memory. If 'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the corresponding memory region. Signed-off-by: Junyan He Signed-off-by: Haozhong Zhang --- backends/hostmem-file.c | 27 ++- docs/nvdimm.txt | 14 ++ exec.c | 9 + include/exec/memory.h | 6 ++ include/exec/ram_addr.h | 3 +++ qemu-options.hx | 7 +++ 6 files changed, 65 insertions(+), 1 deletion(-) diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 34c68bb..ccca7a1 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -34,6 +34,7 @@ struct HostMemoryBackendFile { bool discard_data; char *mem_path; uint64_t align; +bool is_pmem; }; static void @@ -59,7 +60,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), path, backend->size, fb->align, - backend->share ? RAM_SHARED : 0, + (backend->share ? RAM_SHARED : 0) | + (fb->is_pmem ? RAM_PMEM : 0), fb->mem_path, errp); g_free(path); } @@ -131,6 +133,26 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, error_propagate(errp, local_err); } +static bool file_memory_backend_get_pmem(Object *o, Error **errp) +{ +return MEMORY_BACKEND_FILE(o)->is_pmem; +} + +static void file_memory_backend_set_pmem(Object *o, bool value, Error **errp) +{ +HostMemoryBackend *backend = MEMORY_BACKEND(o); +HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + +if (host_memory_backend_mr_inited(backend)) { +error_setg(errp, "cannot change property 'pmem' of %s '%s'", + object_get_typename(o), + object_get_canonical_path_component(OBJECT(backend))); +return; +} + +fb->is_pmem = value; +} + static void file_backend_unparent(Object *obj) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); @@ -162,6 +184,9 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL, &error_abort); +object_class_property_add_bool(oc, "pmem", +file_memory_backend_get_pmem, file_memory_backend_set_pmem, +&error_abort); } static void file_backend_instance_finalize(Object *o) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index e903d8b..bcb2032 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -153,3 +153,17 @@ guest NVDIMM region mapping structure. This unarmed flag indicates guest software that this vNVDIMM device contains a region that cannot accept persistent writes. In result, for example, the guest Linux NVDIMM driver, marks such vNVDIMM device as read-only. + +If the vNVDIMM backend is on the host persistent memory that can be +accessed in SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's +suggested to set the 'pmem' option of memory-backend-file to 'on'. When +'pmem=on' and QEMU is built with libpmem [2] support (configured with +--enable-libpmem), QEMU will take necessary operations to guarantee +the persistence of its own writes to the vNVDIMM backend (e.g., in +vNVDIMM label emulation and live migration). + +References +-- + +[1] SNIA NVM Programming Model: https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf +[2] PMDK: http://pmem.io/pmdk/ diff --git a/exec.c b/exec.c index f2082fa..f066705 100644 --- a/exec.c +++ b/exec.c @@ -2061,6 +2061,9 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, Error *local_err = NULL; int64_t file_size; +/* Just support these ram flags by now. */ +assert(ram_flags == 0 || (ram_flags & (RAM_SHARED | RAM_PMEM))); + if (xen_enabled()) { error_setg(errp, "-mem-path not supported with Xen"); return NULL; @@ -3971,6 +3974,11 @@ err: return ret; } +bool ramblock_is_pmem(RAMBlock *rb) +{ +return rb->flags & RAM_PMEM; +} + #endif void page_size_init(void) @@ -4069,3 +4077,4 @@ void mtree_print_dispatch(fprintf_function mon, void *f, } #endif + diff --git a/include/exec/memory.h b/include/exec/memory.h index 3b68a43..6523512 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -119,6 +119,11 @@ typedef struct IOMMUNotifier IOMMUNotifier; */ #define RAM_UF_ZEROPAGE (1 << 3) +/* QEMU_RAM_PMEM is avail
Re: [Qemu-devel] [PATCH v2 01/20] cutils: Provide strchrnul
On Thu, 31 May 2018 21:25:56 -0400 Keno Fischer wrote: > strchrnul is a GNU extension and thus unavailable on a number of targets. > In the review for a commit removing strchrnul from 9p, I was asked to > create a qemu_strchrnul helper to factor out this functionality. > Do so, and use it in a number of other places in the code base that inlined > the replacement pattern in a place where strchrnul could be used. > > Signed-off-by: Keno Fischer > --- > And possibly we could detect in configure if the host has strchrnul() and use it, but this optimization can be done later. I haven't checked if there could be other candidates in the current code base though. Also, this patch touches some other subsystems, so I'm Cc'ing to the other maintainers as reported by ./scripts/get_maintainer.pl: Greg Kurz (supporter:virtio-9p) Markus Armbruster (supporter:QMP) "Dr. David Alan Gilbert" (maintainer:Human Monitor (HMP)) qemu-devel@nongnu.org (open list:All patches CC here) Anyway, Acked-by: Greg Kurz > Changes since v1: New patch > > hw/9pfs/9p-local.c| 2 +- > include/qemu/cutils.h | 1 + > monitor.c | 8 ++-- > util/cutils.c | 13 + > util/qemu-option.c| 6 +- > util/uri.c| 6 ++ > 6 files changed, 20 insertions(+), 16 deletions(-) > > diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c > index b37b1db..bcf2798 100644 > --- a/hw/9pfs/9p-local.c > +++ b/hw/9pfs/9p-local.c > @@ -65,7 +65,7 @@ int local_open_nofollow(FsContext *fs_ctx, const char > *path, int flags, > assert(*path != '/'); > > head = g_strdup(path); > -c = strchrnul(path, '/'); > +c = qemu_strchrnul(path, '/'); > if (*c) { > /* Intermediate path element */ > head[c - path] = 0; > diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h > index a663340..bc40c30 100644 > --- a/include/qemu/cutils.h > +++ b/include/qemu/cutils.h > @@ -122,6 +122,7 @@ int qemu_strnlen(const char *s, int max_len); > * Returns: the pointer originally in @input. > */ > char *qemu_strsep(char **input, const char *delim); > +const char *qemu_strchrnul(const char *s, int c); > time_t mktimegm(struct tm *tm); > int qemu_fdatasync(int fd); > int fcntl_setfl(int fd, int flag); > diff --git a/monitor.c b/monitor.c > index 922cfc0..e1f01c4 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -798,9 +798,7 @@ static int compare_cmd(const char *name, const char *list) > p = list; > for(;;) { > pstart = p; > -p = strchr(p, '|'); > -if (!p) > -p = pstart + strlen(pstart); > +p = qemu_strchrnul(p, '|'); > if ((p - pstart) == len && !memcmp(pstart, name, len)) > return 1; > if (*p == '\0') > @@ -3401,9 +3399,7 @@ static void cmd_completion(Monitor *mon, const char > *name, const char *list) > p = list; > for(;;) { > pstart = p; > -p = strchr(p, '|'); > -if (!p) > -p = pstart + strlen(pstart); > +p = qemu_strchrnul(p, '|'); > len = p - pstart; > if (len > sizeof(cmd) - 2) > len = sizeof(cmd) - 2; > diff --git a/util/cutils.c b/util/cutils.c > index 0de69e6..6e078b0 100644 > --- a/util/cutils.c > +++ b/util/cutils.c > @@ -545,6 +545,19 @@ int qemu_strtou64(const char *nptr, const char **endptr, > int base, > } > > /** > + * Searches for the first occurrence of 'c' in 's', and returns a pointer > + * to the trailing null byte if none was found. > + */ > +const char *qemu_strchrnul(const char *s, int c) > +{ > +const char *e = strchr(s, c); > +if (!e) { > +e = s + strlen(s); > +} > +return e; > +} > + > +/** > * parse_uint: > * > * @s: String to parse > diff --git a/util/qemu-option.c b/util/qemu-option.c > index 58d1c23..54eca12 100644 > --- a/util/qemu-option.c > +++ b/util/qemu-option.c > @@ -77,11 +77,7 @@ const char *get_opt_value(const char *p, char **value) > > *value = NULL; > while (1) { > -offset = strchr(p, ','); > -if (!offset) { > -offset = p + strlen(p); > -} > - > +offset = qemu_strchrnul(p, ','); > length = offset - p; > if (*offset != '\0' && *(offset + 1) == ',') { > length++; > diff --git a/util/uri.c b/util/uri.c > index 8624a7a..8bdef84 100644 > --- a/util/uri.c > +++ b/util/uri.c > @@ -52,6 +52,7 @@ > */ > > #include "qemu/osdep.h" > +#include "qemu/cutils.h" > > #include "qemu/uri.h" > > @@ -2266,10 +2267,7 @@ struct QueryParams *query_params_parse(const char > *query) > /* Find the next separator, or end of the string. */ > end = strchr(query, '&'); > if (!end) { > -end = strchr(query, ';'); > -} > -if (!end) { > -end = query + strlen(query); > +end = qemu_strchrnul(query, ';'); > } > > /* Find the first '=' char
[Qemu-devel] [PATCH V6 7/7] migration/ram: ensure write persistence on loading all data to PMEM.
From: Junyan He Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He --- include/qemu/pmem.h | 1 + migration/ram.c | 8 stubs/pmem.c| 4 3 files changed, 13 insertions(+) diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h index 00d6680..b1e1b5c 100644 --- a/include/qemu/pmem.h +++ b/include/qemu/pmem.h @@ -17,6 +17,7 @@ #else /* !CONFIG_LIBPMEM */ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len); +void *pmem_persist(const void *addr, size_t len); #endif /* CONFIG_LIBPMEM */ diff --git a/migration/ram.c b/migration/ram.c index aa0c6f0..09525b2 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -33,6 +33,7 @@ #include "qemu/bitops.h" #include "qemu/bitmap.h" #include "qemu/main-loop.h" +#include "qemu/pmem.h" #include "xbzrle.h" #include "ram.h" #include "migration.h" @@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque) static int ram_load_cleanup(void *opaque) { RAMBlock *rb; + +RAMBLOCK_FOREACH(rb) { +if (ramblock_is_pmem(rb)) { +pmem_persist(rb->host, rb->used_length); + } +} + xbzrle_load_cleanup(); compress_threads_load_cleanup(); diff --git a/stubs/pmem.c b/stubs/pmem.c index b4ec72d..c5bc6d6 100644 --- a/stubs/pmem.c +++ b/stubs/pmem.c @@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len) { return memcpy(pmemdest, src, len); } + +void *pmem_persist(const void *addr, size_t len) +{ +} -- 2.7.4
Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT
Dear Pavel, Thank you for providing me with all the details. Let us take an example of a Network packet. In icount mode, when the network backend, receives a network packet, you record the whole packet with the help of the replay-filter. This packet will be written to the log file. Now when the time comes for replay, you stop accepting any packets from the network backend and directly inject all of the packets that you have already recorded in the log file into the guest address space memory. Am I correct in understanding this ? Thanks and Regards, Arnab On Fri, Jun 1, 2018 at 1:31 AM, Pavel Dovgalyuk wrote: > Hi, > > > > I’m not familiar with KVM, but I know successful attempts of replaying the > execution by logging IO and MMIO in TCG mode. > > The difference in CPU I/O and VM I/O is the following. In icount we record > anything coming into the VM, but not into the CPU. > > It means that the whole packet is recorded. Virtual hardware behaves > deterministically and therefore CPU will get identical > > input in case of replay, because the whole recorded packet is injected > again by the filter. > > > > Pavel Dovgalyuk > > > > *From:* Arnabjyoti Kalita [mailto:akal...@cs.stonybrook.edu] > *Sent:* Thursday, May 31, 2018 11:14 PM > *To:* Pavel Dovgalyuk > *Cc:* Stefan Hajnoczi; qemu-devel@nongnu.org; Pavel Dovgalyuk > *Subject:* Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT > > > > Dear Pavel, > > > > Thank you for your answer. I am not being able to understand the > difference between CPU I/Os and VM I/Os. Would any network packet that > comes into the Guest OS from the outside be a part of VM I/O or CPU I/O ? I > am only interested in "recording" and "replaying" those network packets > that come from the outside into the networking backend and not the other > way around. Say for example when I get a VMExit because of the arrival of a > network packet, I will use the VMExit reason : "KVM_EXIT_MMIO" to trace > back to "e1000_mmio_write()" which I expect should be enough to record > network packets that come from the outside and write to the guest address > space for "e1000" devices. In such a case, I think I will not have to use > the "network-filter" backend that you use to record VM I/O only. Let me > know if you find errors in my approach. > > > > I will try to see how I can record disk packets. If disk packets use other > ways of writing to the guest memory apart from a normal VMExit, I will try > to find it out. Eventually I hope that it will use one of the available > disk front-end functions to write to the guest memory from the disk, just > like e1000 does with an "e1000_mmio_write()" call. > > > > Thanks and best regards, > > Arnab > > > > > > > > > > > > > > > > On Thu, May 31, 2018 at 8:44 AM, Pavel Dovgalyuk > wrote: > > > From: Stefan Hajnoczi [mailto:stefa...@gmail.com] > > On Wed, May 30, 2018 at 11:19:13PM -0400, Arnabjyoti Kalita wrote: > > > I am trying to implement a 'minimal' record-replay mechanism for KVM, > which > > > is similar to the one existing for TCG via -icount. I am trying to > record > > > I/O events only (specifically disk and network events) when KVM does a > > > VMEXIT. This has led me to the function kvm_cpu_exec where I can > clearly > > > see the different ways of handling all of the possible VMExit cases > (like > > > PIO, MMIO etc.). To record network packets, I am working with the e1000 > > > hardware device. > > > > > > Can I make sure that all of the network I/O, atleast for the e1000 > device > > > happens through the KVM_EXIT_MMIO case and subsequent use of the > > > address_space_rw() function ? Do I also need to look at other > functions as > > > well ? Also for recording disk activity, can I make sure that looking > out > > > for the KVM_EXIT_MMIO and/or KVM_EXIT_PIO cases in the vmexit > mechanism, > > > will be enough ? > > > > > > Let me know if there are other details that I need to take care of. I > am > > > using QEMU 2.11 on a x86-64 CPU and the guest runs a Linux Kernel 4.4 > with > > > Ubuntu 16.04. > > The main icount-based record/replay advantage is that we don't record > any CPU IO. We record only VM IO (e.g., by using the network filter). > > Disk devices may transfer data to CPU using DMA, therefore intercepting > only VMExit cases will not be enough. > > Pavel Dovgalyuk > > >
Re: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory
Hi, This series failed docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. Type: series Message-id: 1527840629-18648-1-git-send-email-junyan...@gmx.com Subject: [Qemu-devel] [PATCH V6 0/7] nvdimm: guarantee persistence of QEMU writes to persistent memory === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 4ddbee3d9b migration/ram: ensure write persistence on loading all data to PMEM. 1e8b882644 migration/ram: Add check and info message to nvdimm post copy. fbc70b5463 mem/nvdimm: ensure write persistence to PMEM in label emulation 10e5fa15d3 configure: add libpmem support 8f7f12852b hostmem-file: add the 'pmem' option ab99d47fbf memory, exec: switch file ram allocation functions to 'flags' parameters 08c1d813ef memory, exec: Expose all memory block related flags. === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-4kz3s9ce/src/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' BUILD fedora make[1]: Entering directory '/var/tmp/patchew-tester-tmp-4kz3s9ce/src' GEN /var/tmp/patchew-tester-tmp-4kz3s9ce/src/docker-src.2018-06-01-04.25.22.560/qemu.tar Cloning into '/var/tmp/patchew-tester-tmp-4kz3s9ce/src/docker-src.2018-06-01-04.25.22.560/qemu.tar.vroot'... done. Your branch is up-to-date with 'origin/test'. Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-4kz3s9ce/src/docker-src.2018-06-01-04.25.22.560/qemu.tar.vroot/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb' Cloning into '/var/tmp/patchew-tester-tmp-4kz3s9ce/src/docker-src.2018-06-01-04.25.22.560/qemu.tar.vroot/ui/keycodemapdb'... Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce' COPYRUNNER RUN test-mingw in qemu:fedora Packages installed: PyYAML-3.12-5.fc27.x86_64 SDL2-devel-2.0.7-2.fc27.x86_64 bc-1.07.1-3.fc27.x86_64 bison-3.0.4-8.fc27.x86_64 bluez-libs-devel-5.48-3.fc27.x86_64 brlapi-devel-0.6.6-8.fc27.x86_64 bzip2-1.0.6-24.fc27.x86_64 bzip2-devel-1.0.6-24.fc27.x86_64 ccache-3.3.6-1.fc27.x86_64 clang-5.0.1-5.fc27.x86_64 device-mapper-multipath-devel-0.7.1-9.git847cc43.fc27.x86_64 findutils-4.6.0-16.fc27.x86_64 flex-2.6.1-5.fc27.x86_64 gcc-7.3.1-5.fc27.x86_64 gcc-c++-7.3.1-5.fc27.x86_64 gettext-0.19.8.1-12.fc27.x86_64 git-2.14.3-3.fc27.x86_64 glib2-devel-2.54.3-2.fc27.x86_64 glusterfs-api-devel-3.12.7-1.fc27.x86_64 gnutls-devel-3.5.18-2.fc27.x86_64 gtk3-devel-3.22.26-2.fc27.x86_64 hostname-3.18-4.fc27.x86_64 libaio-devel-0.3.110-9.fc27.x86_64 libasan-7.3.1-5.fc27.x86_64 libattr-devel-2.4.47-21.fc27.x86_64 libcap-devel-2.25-7.fc27.x86_64 libcap-ng-devel-0.7.8-5.fc27.x86_64 libcurl-devel-7.55.1-10.fc27.x86_64 libfdt-devel-1.4.6-1.fc27.x86_64 libpng-devel-1.6.31-1.fc27.x86_64 librbd-devel-12.2.4-1.fc27.x86_64 libssh2-devel-1.8.0-5.fc27.x86_64 libubsan-7.3.1-5.fc27.x86_64 libusbx-devel-1.0.21-4.fc27.x86_64 libxml2-devel-2.9.7-1.fc27.x86_64 llvm-5.0.1-6.fc27.x86_64 lzo-devel-2.08-11.fc27.x86_64 make-4.2.1-4.fc27.x86_64 mingw32-SDL-1.2.15-9.fc27.noarch mingw32-bzip2-1.0.6-9.fc27.noarch mingw32-curl-7.54.1-2.fc27.noarch mingw32-glib2-2.54.1-1.fc27.noarch mingw32-gmp-6.1.2-2.fc27.noarch mingw32-gnutls-3.5.13-2.fc27.noarch mingw32-gtk2-2.24.31-4.fc27.noarch mingw32-gtk3-3.22.16-1.fc27.noarch mingw32-libjpeg-turbo-1.5.1-3.fc27.noarch mingw32-libpng-1.6.29-2.fc27.noarch mingw32-libssh2-1.8.0-3.fc27.noarch mingw32-libtasn1-4.13-1.fc27.noarch mingw32-nettle-3.3-3.fc27.noarch mingw32-pixman-0.34.0-3.fc27.noarch mingw32-pkg-config-0.28-9.fc27.x86_64 mingw64-SDL-1.2.15-9.fc27.noarch mingw64-bzip2-1.0.6-9.fc27.noarch mingw64-curl-7.54.1-2.fc27.noarch mingw64-glib2-2.54.1-1.fc27.noarch mingw64-gmp-6.1.2-2.fc27.noarch mingw64-gnutls-3.5.13-2.fc27.noarch mingw64-gtk2-2.24.31-4.fc27.noarch mingw64-gtk3-3.22.16-1.fc27.noarch mingw64-libjpeg-turbo-1.5.1-3.fc27.noarch mingw64-libpng-1.6.29-2.fc27.noarch mingw64-libssh2-1.8.0-3.fc27.noarch mingw64-libtasn1-4.13-1.fc27.noarch mingw64-nettle-3.3-3.fc27.noarch mingw64-pixman-0.34.0-3.fc27.noarch mingw64-pkg-config-0.28-9.fc27.x86_64 ncurses-devel-6.0-13.20170722.fc27.x86_64 nettle-devel-3.4-1.fc27.x86_64 nss-devel-3.36.0-1.0.fc27.x86_64 numactl-devel-2.0.11-5.fc27.x86_64 package libjpeg-devel is not installed perl-5.26.1-403.fc27.x86_64 pixman-devel-0.34.0-4.fc27.x86_64 python3-3.6.2-13.fc27.x86_64 snappy-devel-1.1.4-5.fc27.x86_64 sparse-0.5.1-2.fc27.x86_64 spice-server-devel-0.14.0-1.fc27.x
Re: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading
Hi, This series failed build test on s390x host. Please find the details below. Type: series Message-id: 20180601062849.28641-1-f...@redhat.com Subject: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading === TEST SCRIPT BEGIN === #!/bin/bash # Testing script will be invoked under the git checkout with # HEAD pointing to a commit that has the patches applied on top of "base" # branch set -e echo "=== ENV ===" env echo "=== PACKAGES ===" rpm -qa echo "=== TEST BEGIN ===" CC=$HOME/bin/cc INSTALL=$PWD/install BUILD=$PWD/build echo -n "Using CC: " realpath $CC mkdir -p $BUILD $INSTALL SRC=$PWD cd $BUILD $SRC/configure --cc=$CC --prefix=$INSTALL make -j4 # XXX: we need reliable clean up # make check -j4 V=1 make install === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20180601062849.28641-1-f...@redhat.com -> patchew/20180601062849.28641-1-f...@redhat.com Switched to a new branch 'test' 5c03bb24cc qemu-img: Convert with copy offloading 51e732e62a block-backend: Add blk_co_copy_range 92edadadac iscsi: Implement copy offloading 6f8d57827a iscsi: Create and use iscsi_co_wait_for_task 5940e7ebff iscsi: Query and save device designator when opening 7619b4045d file-posix: Implement bdrv_co_copy_range af0dbb042d qcow2: Implement copy offloading d1550d576c raw: Implement copy offloading 61c824c950 raw: Check byte range uniformly 35c506afdd block: Introduce API for copy offloading b5683be886 docker: Update fedora image to 28 === OUTPUT BEGIN === === ENV === LANG=en_US.UTF-8 XDG_SESSION_ID=212219 USER=fam PWD=/var/tmp/patchew-tester-tmp-2l7s8dte/src HOME=/home/fam SHELL=/bin/sh SHLVL=2 PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug LOGNAME=fam DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus XDG_RUNTIME_DIR=/run/user/1012 PATH=/usr/bin:/bin _=/usr/bin/env === PACKAGES === gpg-pubkey-873529b8-54e386ff glibc-debuginfo-common-2.24-10.fc25.s390x fedora-release-26-1.noarch dejavu-sans-mono-fonts-2.35-4.fc26.noarch xemacs-filesystem-21.5.34-22.20170124hgf412e9f093d4.fc26.noarch bash-4.4.12-7.fc26.s390x libSM-1.2.2-5.fc26.s390x libmpc-1.0.2-6.fc26.s390x libaio-0.3.110-7.fc26.s390x libverto-0.2.6-7.fc26.s390x perl-Scalar-List-Utils-1.48-1.fc26.s390x iptables-libs-1.6.1-2.fc26.s390x tcl-8.6.6-2.fc26.s390x libxshmfence-1.2-4.fc26.s390x expect-5.45-23.fc26.s390x perl-Thread-Queue-3.12-1.fc26.noarch perl-encoding-2.19-6.fc26.s390x keyutils-1.5.10-1.fc26.s390x gmp-devel-6.1.2-4.fc26.s390x enchant-1.6.0-16.fc26.s390x python-gobject-base-3.24.1-1.fc26.s390x python3-enchant-1.6.10-1.fc26.noarch python-lockfile-0.11.0-6.fc26.noarch python2-pyparsing-2.1.10-3.fc26.noarch python2-lxml-4.1.1-1.fc26.s390x librados2-10.2.7-2.fc26.s390x trousers-lib-0.3.13-7.fc26.s390x libdatrie-0.2.9-4.fc26.s390x libsoup-2.58.2-1.fc26.s390x passwd-0.79-9.fc26.s390x bind99-libs-9.9.10-3.P3.fc26.s390x python3-rpm-4.13.0.2-1.fc26.s390x systemd-233-7.fc26.s390x virglrenderer-0.6.0-1.20170210git76b3da97b.fc26.s390x s390utils-ziomon-1.36.1-3.fc26.s390x s390utils-osasnmpd-1.36.1-3.fc26.s390x libXrandr-1.5.1-2.fc26.s390x libglvnd-glx-1.0.0-1.fc26.s390x texlive-ifxetex-svn19685.0.5-33.fc26.2.noarch texlive-psnfss-svn33946.9.2a-33.fc26.2.noarch texlive-dvipdfmx-def-svn40328-33.fc26.2.noarch texlive-natbib-svn20668.8.31b-33.fc26.2.noarch texlive-xdvi-bin-svn40750-33.20160520.fc26.2.s390x texlive-cm-svn32865.0-33.fc26.2.noarch texlive-beton-svn15878.0-33.fc26.2.noarch texlive-fpl-svn15878.1.002-33.fc26.2.noarch texlive-mflogo-svn38628-33.fc26.2.noarch texlive-texlive-docindex-svn41430-33.fc26.2.noarch texlive-luaotfload-bin-svn34647.0-33.20160520.fc26.2.noarch texlive-koma-script-svn41508-33.fc26.2.noarch texlive-pst-tree-svn24142.1.12-33.fc26.2.noarch texlive-breqn-svn38099.0.98d-33.fc26.2.noarch texlive-xetex-svn41438-33.fc26.2.noarch gstreamer1-plugins-bad-free-1.12.3-1.fc26.s390x xorg-x11-font-utils-7.5-33.fc26.s390x ghostscript-fonts-5.50-36.fc26.noarch libXext-devel-1.3.3-5.fc26.s390x libusbx-devel-1.0.21-2.fc26.s390x libglvnd-devel-1.0.0-1.fc26.s390x emacs-25.3-3.fc26.s390x alsa-lib-devel-1.1.4.1-1.fc26.s390x kbd-2.0.4-2.fc26.s390x dconf-0.26.0-2.fc26.s390x mc-4.8.19-5.fc26.s390x doxygen-1.8.13-9.fc26.s390x dpkg-1.18.24-1.fc26.s390x libtdb-1.3.13-1.fc26.s390x python2-pynacl-1.1.1-1.fc26.s390x perl-Filter-1.58-1.fc26.s390x python2-pip-9.0.1-11.fc26.noarch dnf-2.7.5-2.fc26.noarch bind-license-9.11.2-1.P1.fc26.noarch libtasn1-4.13-1.fc26.s390x cpp-7.3.1-2.fc26.s390x pkgconf-1.3.12-2.fc26.s390x python2-fedora-0.10.0-1.fc26.noarch cmake-filesystem-3.10.1-11.fc26.s390x python3-requests-kerberos-0.12.0-1.fc26.noarch libmicrohttpd-0.9.59-1.fc26.s390x GeoIP-GeoLite-data-2018.01-1.fc26.noarch python2-libs-2.7.14-7.fc26.s390x libidn2-2.0.4-3.fc26.s390x p11-kit-devel-0.23.10-1.fc26.s390x perl-Errno-1.25-396.fc26.s390x libdrm-2.4.90-2.fc26.s390x sssd-common-1.16.1-1.fc26.s390x boost-random-1.63.0-11.fc26.s390x urw-fonts-2.4-
Re: [Qemu-devel] [PATCH v2 01/20] cutils: Provide strchrnul
* Greg Kurz (gr...@kaod.org) wrote: > On Thu, 31 May 2018 21:25:56 -0400 > Keno Fischer wrote: > > > strchrnul is a GNU extension and thus unavailable on a number of targets. > > In the review for a commit removing strchrnul from 9p, I was asked to > > create a qemu_strchrnul helper to factor out this functionality. > > Do so, and use it in a number of other places in the code base that inlined > > the replacement pattern in a place where strchrnul could be used. > > > > Signed-off-by: Keno Fischer > > --- > > > > And possibly we could detect in configure if the host has strchrnul() and > use it, but this optimization can be done later. > > I haven't checked if there could be other candidates in the current code > base though. Also, this patch touches some other subsystems, so I'm Cc'ing > to the other maintainers as reported by ./scripts/get_maintainer.pl: That looks fine from my point of view; I can see you could probably also use it in the code at the start of the while loop in hmp_sendkey: while (1) { separator = strchr(keys, '-'); keyname_len = separator ? separator - keys : strlen(keys); Dave > Greg Kurz (supporter:virtio-9p) > Markus Armbruster (supporter:QMP) > "Dr. David Alan Gilbert" (maintainer:Human Monitor > (HMP)) > qemu-devel@nongnu.org (open list:All patches CC here) > > Anyway, > > Acked-by: Greg Kurz > > > Changes since v1: New patch > > > > hw/9pfs/9p-local.c| 2 +- > > include/qemu/cutils.h | 1 + > > monitor.c | 8 ++-- > > util/cutils.c | 13 + > > util/qemu-option.c| 6 +- > > util/uri.c| 6 ++ > > 6 files changed, 20 insertions(+), 16 deletions(-) > > > > diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c > > index b37b1db..bcf2798 100644 > > --- a/hw/9pfs/9p-local.c > > +++ b/hw/9pfs/9p-local.c > > @@ -65,7 +65,7 @@ int local_open_nofollow(FsContext *fs_ctx, const char > > *path, int flags, > > assert(*path != '/'); > > > > head = g_strdup(path); > > -c = strchrnul(path, '/'); > > +c = qemu_strchrnul(path, '/'); > > if (*c) { > > /* Intermediate path element */ > > head[c - path] = 0; > > diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h > > index a663340..bc40c30 100644 > > --- a/include/qemu/cutils.h > > +++ b/include/qemu/cutils.h > > @@ -122,6 +122,7 @@ int qemu_strnlen(const char *s, int max_len); > > * Returns: the pointer originally in @input. > > */ > > char *qemu_strsep(char **input, const char *delim); > > +const char *qemu_strchrnul(const char *s, int c); > > time_t mktimegm(struct tm *tm); > > int qemu_fdatasync(int fd); > > int fcntl_setfl(int fd, int flag); > > diff --git a/monitor.c b/monitor.c > > index 922cfc0..e1f01c4 100644 > > --- a/monitor.c > > +++ b/monitor.c > > @@ -798,9 +798,7 @@ static int compare_cmd(const char *name, const char > > *list) > > p = list; > > for(;;) { > > pstart = p; > > -p = strchr(p, '|'); > > -if (!p) > > -p = pstart + strlen(pstart); > > +p = qemu_strchrnul(p, '|'); > > if ((p - pstart) == len && !memcmp(pstart, name, len)) > > return 1; > > if (*p == '\0') > > @@ -3401,9 +3399,7 @@ static void cmd_completion(Monitor *mon, const char > > *name, const char *list) > > p = list; > > for(;;) { > > pstart = p; > > -p = strchr(p, '|'); > > -if (!p) > > -p = pstart + strlen(pstart); > > +p = qemu_strchrnul(p, '|'); > > len = p - pstart; > > if (len > sizeof(cmd) - 2) > > len = sizeof(cmd) - 2; > > diff --git a/util/cutils.c b/util/cutils.c > > index 0de69e6..6e078b0 100644 > > --- a/util/cutils.c > > +++ b/util/cutils.c > > @@ -545,6 +545,19 @@ int qemu_strtou64(const char *nptr, const char > > **endptr, int base, > > } > > > > /** > > + * Searches for the first occurrence of 'c' in 's', and returns a pointer > > + * to the trailing null byte if none was found. > > + */ > > +const char *qemu_strchrnul(const char *s, int c) > > +{ > > +const char *e = strchr(s, c); > > +if (!e) { > > +e = s + strlen(s); > > +} > > +return e; > > +} > > + > > +/** > > * parse_uint: > > * > > * @s: String to parse > > diff --git a/util/qemu-option.c b/util/qemu-option.c > > index 58d1c23..54eca12 100644 > > --- a/util/qemu-option.c > > +++ b/util/qemu-option.c > > @@ -77,11 +77,7 @@ const char *get_opt_value(const char *p, char **value) > > > > *value = NULL; > > while (1) { > > -offset = strchr(p, ','); > > -if (!offset) { > > -offset = p + strlen(p); > > -} > > - > > +offset = qemu_strchrnul(p, ','); > > length = offset - p; > > if (*offset != '\0' && *(offset + 1) == ',') { > > length++; > > diff --git a/util/uri.c b/util/uri.c > > index 8624a7a..8bdef84 100644 > > ---
Re: [Qemu-devel] [PATCH 00/33] linux-user: Begin splitting do_syscall
Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20180601073050.8054-1-richard.hender...@linaro.org Subject: [Qemu-devel] [PATCH 00/33] linux-user: Begin splitting do_syscall === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20180601073050.8054-1-richard.hender...@linaro.org -> patchew/20180601073050.8054-1-richard.hender...@linaro.org Switched to a new branch 'test' 747f98cda4 linux-user: Split out rt_sigqueueinfo, rt_sigtimedwait, rt_tgsigqueueinfo 8c59503fd9 linux-user: Split out rt_sigpending, rt_sigsuspend, sigpending, sigsuspend 3a57cc334f linux-user: Split out rt_sigprocmask, sgetmask, sigprocmask, ssetmask 81bd8e94dd linux-user: Split out rt_sigaction, sigaction 0fc1031ca6 linux-user: Split out getpgrp, getppid, setsid 76b6ab61e4 linux-user: Split out chroot, dup2, dup3, fcntl, setpgid, umask 818843d921 linux-user: Split out ioctl d3a9fa76f4 linux-user: Split out acct, pipe, pipe2, times, umount2 73775770a5 linux-user: Split out dup, mkdir, mkdirat, rmdir 8d6ff832d7 linux-user: Split out rename, renameat, renameat2 ded97414bf linux-user: Split out access, faccessat, futimesat, kill, nice, sync, syncfs f2ac2715a0 linux-user: Split out alarm, pause, stime, utime, utimes fd239f018f linux-user: Split out mount, umount 7f8d08b0df linux-user: Split out getpid, getxpid, lseek 0033fce107 linux-user: Remove all unimplemented entries 1d093d6966 linux-user: Split out chdir, mknod, mknodat, time, chmod d0bc8c69af linux-user: Split out unlink, unlinkat ee1804088c linux-user: Split out link, linkat 7da1a7d2ec linux-user: Split out creat, fork, waitid, waitpid 6c9db8aee2 linux-user: Split out open_to_handle_at 8e8c59cd27 linux-user: Split out name_to_handle_at 4ed7c56516 linux-user: Split out open, openat dbf85fddf7 linux-user: Split out execve d4654a6ef7 linux-user: Split out brk, close, exit, read, write 404318016d linux-user: Set up infrastructure for table-izing syscalls a7b3ac0407 linux-user: Make syscall number unsigned 6086f3339b linux-user: Propagate goto fail to return 1969ae08d3 linux-user: Split out goto unimplemented to do_unimplemented f3213e38ff linux-user: Propagate goto unimplemented_nowarn to return 4e20509f56 linux-user: Propagate goto efault to return dda83e01f8 linux-user: Propagate goto ebadf to return fddfe2eb57 linux-user: Relax single exit from "break" bac309b293 linux-user: Split out do_syscall1 === OUTPUT BEGIN === Checking PATCH 1/33: linux-user: Split out do_syscall1... Checking PATCH 2/33: linux-user: Relax single exit from "break"... ERROR: code indent should never use tabs #1929: FILE: linux-user/syscall.c:11150: +^Ireturn ret;$ ERROR: code indent should never use tabs #1938: FILE: linux-user/syscall.c:11159: +^Ireturn ret;$ ERROR: code indent should never use tabs #1947: FILE: linux-user/syscall.c:11166: +^Ireturn target_ftruncate64(cpu_env, arg1, arg2, arg3, arg4);$ ERROR: code indent should never use tabs #2411: FILE: linux-user/syscall.c:11862: +^Ireturn ret;$ ERROR: code indent should never use tabs #2683: FILE: linux-user/syscall.c:12216: +^Ireturn ret;$ total: 5 errors, 0 warnings, 2853 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 3/33: linux-user: Propagate goto ebadf to return... Checking PATCH 4/33: linux-user: Propagate goto efault to return... ERROR: suspect code indent for conditional statements (11, 14) #642: FILE: linux-user/syscall.c:9553: if (!p) { + return -TARGET_EFAULT; total: 1 errors, 0 warnings, 1211 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 5/33: linux-user: Propagate goto unimplemented_nowarn to return... Checking PATCH 6/33: linux-user: Split out goto unimplemented to do_unimplemented... Checking PATCH 7/33: linux-user: Propagate goto fail to return... Checking PATCH 8/33: linux-user: Make syscall number unsigned... Checking PATCH 9/33: linux-user: Set up infrastructure for table-izing syscalls... Checking PATCH 10/33: linux-user: Split out brk, close, exit, read, write... Checking PATCH 11/33: linux-user: Split out execve... Checking PATCH 12/33: linux-u
Re: [Qemu-devel] [PATCH v3 15/22] target/arm: Add ARM_FEATURE_V7VE for v7 Virtualization Extensions
On 31 May 2018 at 21:39, Aaron Lindsay wrote: > On May 31 15:18, Peter Maydell wrote: >>if (arm_feature(env, ARM_FEATURE_V7VE) { >>/* v7 Virtualization Extensions. In real hardware this implies >> * EL2 and also the presence of the Security Extensions. >> * For QEMU, for backwards-compatibility we implement some >> * CPUs or CPU configs which have no actual EL2 or EL3 but do >> * include the various other features that V7VE implies. >> * Presence of EL2 itself is ARM_FEATURE_EL2, and of the >> * Security Extensions is ARM_FEATURE_EL3. >> */ >>set_feature(env, ARM_FEATURE_ARM_DIV); > > Is it safe to assume from your comment above regarding keeping ARM_DIV > separate from V7VE that the inclusion of it here is an oversight and > that only LPAE and V7 should be set if V7VE is? (and that V8 should > now directly imply both V7VE and ARM_DIV?) No; V7VE always implies ARM_DIV. (ARM_DIV doesn't imply V7VE, though, which is why it is a separate feature bit.) thanks -- PMM
Re: [Qemu-devel] [PATCH v7 04/11] hmp: disable monitor in preconfig state
* Igor Mammedov (imamm...@redhat.com) wrote: > On Fri, 25 May 2018 16:39:34 -0300 > Eduardo Habkost wrote: > > > On Fri, May 25, 2018 at 08:05:30AM +0200, Markus Armbruster wrote: > > > Eduardo Habkost writes: > > > > > > > On Thu, May 24, 2018 at 08:16:20PM +0200, Markus Armbruster wrote: > > > >> Markus Armbruster writes: > > > >> > > > >> > Igor Mammedov writes: > > > >> > > > > >> >> Ban it for now, if someone would need it to work early, > > > >> >> one would have to implement checks if HMP command is valid > > > >> >> at preconfig state. > > > >> >> > > > >> >> Signed-off-by: Igor Mammedov > > > >> >> Reviewed-by: Eric Blake > > > >> >> --- > > > >> >> v5: > > > >> >> * add 'use QMP instead" to error message, suggesting user > > > >> >> the right interface to use > > > >> >> v4: > > > >> >> * v3 was only printing error but not preventing command execution, > > > >> >> Fix it by returning after printing error message. > > > >> >> ("Dr. David Alan Gilbert" ) > > > >> >> --- > > > >> >> monitor.c | 6 ++ > > > >> >> 1 file changed, 6 insertions(+) > > > >> >> > > > >> >> diff --git a/monitor.c b/monitor.c > > > >> >> index 39f8ee1..0ffdf1d 100644 > > > >> >> --- a/monitor.c > > > >> >> +++ b/monitor.c > > > >> >> @@ -3374,6 +3374,12 @@ static void handle_hmp_command(Monitor *mon, > > > >> >> const char *cmdline) > > > >> >> > > > >> >> trace_handle_hmp_command(mon, cmdline); > > > >> >> > > > >> >> +if (runstate_check(RUN_STATE_PRECONFIG)) { > > > >> >> +monitor_printf(mon, "HMP not available in preconfig state, > > > >> >> " > > > >> >> +"use QMP instead\n"); > > > >> >> +return; > > > >> >> +} > > > >> >> + > > > >> >> cmd = monitor_parse_command(mon, cmdline, &cmdline, > > > >> >> mon->cmd_table); > > > >> >> if (!cmd) { > > > >> >> return; > > > >> > > > > >> > So we offer the user an HMP monitor, but we summarily fail all > > > >> > commands. > > > >> > I'm sorry, but that's... searching for polite word... embarrassing. > > > >> > We > > > >> > should accept HMP output only when we're ready to accept it. Yes, > > > >> > that > > > >> > would involve a bit more surgery rather than this cheap hack. The > > > >> > whole > > > >> > preconfig thing smells like a cheap hack to me, but let's not overdo > > > >> > it. > > > >> > > > >> Clarification: I don't think we need to hold the series because of > > > >> this. I do think you should investigate delaying HMP until it can > > > >> work. > > > > > > > > What would it mean to delay HMP? Not creating the socket? > > > > Creating the socket but not accepting clients? Accepting clients > > > > but not consuming any input from the socket until we are out of > > > > preconfig? > > > > > > > > I'm not sure if any of those options would be better. If a human > > > > is already trying to talk on the other side, it seems better to > > > > show QEMU is alive (but not ready to hold a conversation yet) > > > > than staying silent. > > > > > > If this > > > > > > QEMU 2.12.50 monitor - type 'help' for more information > > > (qemu) help > > > HMP not available in preconfig state, use QMP instead > > > (qemu) quit > > > HMP not available in preconfig state, use QMP instead > > > (qemu) let me out dammit > > > HMP not available in preconfig state, use QMP instead > > > (qemu) > > > > > > is better than the alternatives, then I wonder how much more > > > entertainment the alternatives could provide! > > > > > > We *can* do better. Start like this: > > > > > > QEMU 2.12.50 monitor is not ready with -preconfig until you complete > > > configuration with QMP > > > > > > and when we exit preconfig state, add: > > > > > > QEMU 2.12.50 monitor - type 'help' for more information > > > (qemu) > > > > > > Note that this is upfront about the monitor not being ready, avoids > > > misleading the user about "help", talks to the user in the user's terms > > > (-preconfig) instead of internal terms (preconfig state), and is more > > > specific on how to ready the monitor. > > > > Yes, this sounds better than any of the options I have > > considered. > > > > Making at least 'help', 'quit', and 'exit-preconfig' work might > > be even better, though. > I'll look into both options and try to come up a patch to make it better. Lets keep whatever we do here simple. As I understand it, the only reason to deny HMP in preconfig is because we've not got a per-command flag to say which commands are allowed in preconfig state. If you're going to allow 'help', 'quit' etc then you just end up adding that flag (which should be easy) and then we've got the flag and we can go back and enable other HMP commands in preconfig as well. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 1/6] gdbstub: Return the fd from gdbserver_start
On 31 May 2018 at 23:49, Richard Henderson wrote: > This will allow us to protect gdbserver_fd from the guest. Ha, I hadn't realised we already had an internal-to-QEMU filedescriptor :-) thanks -- PMM
[Qemu-devel] [Bug 1774605] [NEW] PowerPC guest does not emulate L2 and L3 cache for KVM vCPUs
Public bug reported: PowerPC KVM guest does not emulate L2 and L2 caches for vCPU, it would be good to have them enabled if not any known issues/limitation already with PowerPC. Host Env: kernel: 4.17.0-rc7-00045-g0512e0134582 qemu: v2.12.0-923-gc181ddaa17-dirty #libvirtd -V libvirtd (libvirt) 4.4.0 Guest Kernel: # uname -a Linux atest-guest 4.17.0-rc7-00045-g0512e0134582 #9 SMP Fri Jun 1 02:55:50 EDT 2018 ppc64le ppc64le ppc64le GNU/Linux Guest: # lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 8 Core(s) per socket: 2 Socket(s): 1 NUMA node(s):1 Model: 2.1 (pvr 004b 0201) Model name: POWER8 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-15 background: x86 enabling cpu L2 cache bydefault and L3 cache on demand for kvm guest and claims performance improvement as vcpus can be benefited with lesser `vmexits due to guest send IPIs.` with L3 cache enabled, below was patch for same. https://git.qemu.org/?p=qemu.git;a=commit;h=14c985cffa6cb177fc01a163d8bcf227c104718c ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1774605 Title: PowerPC guest does not emulate L2 and L3 cache for KVM vCPUs Status in QEMU: New Bug description: PowerPC KVM guest does not emulate L2 and L2 caches for vCPU, it would be good to have them enabled if not any known issues/limitation already with PowerPC. Host Env: kernel: 4.17.0-rc7-00045-g0512e0134582 qemu: v2.12.0-923-gc181ddaa17-dirty #libvirtd -V libvirtd (libvirt) 4.4.0 Guest Kernel: # uname -a Linux atest-guest 4.17.0-rc7-00045-g0512e0134582 #9 SMP Fri Jun 1 02:55:50 EDT 2018 ppc64le ppc64le ppc64le GNU/Linux Guest: # lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 8 Core(s) per socket: 2 Socket(s): 1 NUMA node(s):1 Model: 2.1 (pvr 004b 0201) Model name: POWER8 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-15 background: x86 enabling cpu L2 cache bydefault and L3 cache on demand for kvm guest and claims performance improvement as vcpus can be benefited with lesser `vmexits due to guest send IPIs.` with L3 cache enabled, below was patch for same. https://git.qemu.org/?p=qemu.git;a=commit;h=14c985cffa6cb177fc01a163d8bcf227c104718c To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1774605/+subscriptions
[Qemu-devel] [Bug 1774605] Re: PowerPC guest does not emulate L2 and L3 cache for KVM vCPUs
Guest xml(cpu portion): ... 32 /machine hvm /home/kvmci/linux/vmlinux root=/dev/sda2 rw console=tty0 console=ttyS0,115200 init=/sbin/init initcall_debug ... Host lscpu: # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 80 On-line CPU(s) list: 0,8,16,24,32,40,48,56,64,72 Off-line CPU(s) list: 1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79 Thread(s) per core: 1 Core(s) per socket: 5 Socket(s):2 NUMA node(s): 2 Model:2.1 (pvr 004b 0201) Model name: POWER8E (raw), altivec supported CPU max MHz: 3690. CPU min MHz: 2061. L1d cache:64K L1i cache:32K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s):0,8,16,24,32 NUMA node1 CPU(s):40,48,56,64,72 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1774605 Title: PowerPC guest does not emulate L2 and L3 cache for KVM vCPUs Status in QEMU: New Bug description: PowerPC KVM guest does not emulate L2 and L2 caches for vCPU, it would be good to have them enabled if not any known issues/limitation already with PowerPC. Host Env: kernel: 4.17.0-rc7-00045-g0512e0134582 qemu: v2.12.0-923-gc181ddaa17-dirty #libvirtd -V libvirtd (libvirt) 4.4.0 Guest Kernel: # uname -a Linux atest-guest 4.17.0-rc7-00045-g0512e0134582 #9 SMP Fri Jun 1 02:55:50 EDT 2018 ppc64le ppc64le ppc64le GNU/Linux Guest: # lscpu Architecture:ppc64le Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 8 Core(s) per socket: 2 Socket(s): 1 NUMA node(s):1 Model: 2.1 (pvr 004b 0201) Model name: POWER8 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-15 background: x86 enabling cpu L2 cache bydefault and L3 cache on demand for kvm guest and claims performance improvement as vcpus can be benefited with lesser `vmexits due to guest send IPIs.` with L3 cache enabled, below was patch for same. https://git.qemu.org/?p=qemu.git;a=commit;h=14c985cffa6cb177fc01a163d8bcf227c104718c To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1774605/+subscriptions
Re: [Qemu-devel] [PATCH v2 02/20] 9p: proxy: Fix size passed to `connect`
On Thu, 31 May 2018 21:25:57 -0400 Keno Fischer wrote: > The size to pass to the `connect` call is the size of the entire > `struct sockaddr_un`. Passing anything shorter than this causes errors > on darwin. > From the linux unix(7) manual page: ret = connect (data_socket, (const struct sockaddr *) &addr, sizeof(struct sockaddr_un)); Not sure why it was done differently, but I definitely prefer the fixed size version. Applied to 9p-next. Thanks ! > Signed-off-by: Keno Fischer > --- > > Changes since v1: New patch > > hw/9pfs/9p-proxy.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/hw/9pfs/9p-proxy.c b/hw/9pfs/9p-proxy.c > index e2e0329..47a94e0 100644 > --- a/hw/9pfs/9p-proxy.c > +++ b/hw/9pfs/9p-proxy.c > @@ -1088,7 +1088,7 @@ static int proxy_ioc_getversion(FsContext *fs_ctx, > V9fsPath *path, > > static int connect_namedsocket(const char *path, Error **errp) > { > -int sockfd, size; > +int sockfd; > struct sockaddr_un helper; > > if (strlen(path) >= sizeof(helper.sun_path)) { > @@ -1102,8 +1102,7 @@ static int connect_namedsocket(const char *path, Error > **errp) > } > strcpy(helper.sun_path, path); > helper.sun_family = AF_UNIX; > -size = strlen(helper.sun_path) + sizeof(helper.sun_family); > -if (connect(sockfd, (struct sockaddr *)&helper, size) < 0) { > +if (connect(sockfd, (struct sockaddr *)&helper, sizeof(helper)) < 0) { > error_setg_errno(errp, errno, "failed to connect to '%s'", path); > close(sockfd); > return -1;
[Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start
Hi there, I am doing some test on qemu vcpu hotplug and I run into some trouble. An emulation failure occurs and qemu prints the following msg: KVM internal error. Suberror: 1 emulation failure EAX= EBX= ECX= EDX=0600 ESI= EDI= EBP= ESP=fff8 EIP=ff53 EFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00if GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3 66 68 21 8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 c6 I notice that guest is still running SeabBIOS in real mode when the vcpu has just been pluged. This emulation failure can be steadly reproduced if I am doing vcpu hotplug during VM launch process. After some digging, I find this KVM internal error shows up because KVM cannot emulate some MMIO (gpa 0xfff53 ). So I am confused, (1) does qemu support vcpu hotplug even if guest is running seabios ? (2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm confirm it as a mmio address incorrectly?
Re: [Qemu-devel] [PULL 00/25] target-arm queue
On 31 May 2018 at 17:00, Peter Maydell wrote: > target-arm queue. This has the "plumb txattrs through various > bits of exec.c" patches, and a collection of bug fixes from > various people. > > v2: fix compile error on arm hosts... > > thanks > -- PMM > > > The following changes since commit a3ac12fba028df90f7b3dbec924995c126c41022: > > Merge remote-tracking branch 'remotes/ehabkost/tags/numa-next-pull-request' > into staging (2018-05-31 11:12:36 +0100) > > are available in the Git repository at: > > git://git.linaro.org/people/pmaydell/qemu-arm.git > tags/pull-target-arm-20180531-1 > > for you to fetch changes up to 2f15b79280cf71b7991dfd3f0312a1797630e376: > > KVM: GIC: Fix memory leak due to calling kvm_init_irq_routing twice > (2018-05-31 16:32:35 +0100) > Applied, thanks. -- PMM
[Qemu-devel] [PATCH] file-posix: Consolidate the locking error message
When hot-plugging a block device fails due to image locking errors, users won't see the helpful 'Is another process using the image?' message in QMP because currently the error hint is not carried over there. Even though extending QMP to include hint is a conceivably easy task, Libvirt will need some change to consume that data. Before that is fully sorted out, let's just do the easy fix by joining the two lines. Signed-off-by: Fam Zheng --- block/file-posix.c | 10 ++-- tests/qemu-iotests/153.out | 99 +- tests/qemu-iotests/182.out | 3 +- 3 files changed, 38 insertions(+), 74 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 5a602cfe37..03776e13b1 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -699,11 +699,10 @@ static int raw_check_lock_bytes(BDRVRawState *s, if (ret) { char *perm_name = bdrv_perm_names(p); error_setg(errp, - "Failed to get \"%s\" lock", + "Failed to get \"%s\" lock. " + "Is another process using the image?", perm_name); g_free(perm_name); -error_append_hint(errp, - "Is another process using the image?\n"); return ret; } } @@ -716,11 +715,10 @@ static int raw_check_lock_bytes(BDRVRawState *s, if (ret) { char *perm_name = bdrv_perm_names(p); error_setg(errp, - "Failed to get shared \"%s\" lock", + "Failed to get shared \"%s\" lock. " + "Is another process using the image?", perm_name); g_free(perm_name); -error_append_hint(errp, - "Is another process using the image?\n"); return ret; } } diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out index 2510762ba1..e256a9f714 100644 --- a/tests/qemu-iotests/153.out +++ b/tests/qemu-iotests/153.out @@ -11,86 +11,67 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33554432 backing_file=TEST_DIR/t == Launching QEMU, opts: '' == == Launching another QEMU, opts: '' == -QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,: Failed to get "write" lock -Is another process using the image? +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,: Failed to get "write" lock. Is another process using the image? == Launching another QEMU, opts: 'read-only=on' == -QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,read-only=on: Failed to get shared "write" lock -Is another process using the image? +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,read-only=on: Failed to get shared "write" lock. Is another process using the image? == Launching another QEMU, opts: 'read-only=on,force-share=on' == == Running utility commands == _qemu_io_wrapper -c read 0 512 TEST_DIR/t.qcow2 -can't open device TEST_DIR/t.qcow2: Failed to get "write" lock -Is another process using the image? +can't open device TEST_DIR/t.qcow2: Failed to get "write" lock. Is another process using the image? _qemu_io_wrapper -r -c read 0 512 TEST_DIR/t.qcow2 -can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock -Is another process using the image? +can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock. Is another process using the image? _qemu_io_wrapper -c open TEST_DIR/t.qcow2 -c read 0 512 -can't open device TEST_DIR/t.qcow2: Failed to get "write" lock -Is another process using the image? +can't open device TEST_DIR/t.qcow2: Failed to get "write" lock. Is another process using the image? no file open, try 'help open' _qemu_io_wrapper -c open -r TEST_DIR/t.qcow2 -c read 0 512 -can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock -Is another process using the image? +can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock. Is another process using the image? no file open, try 'help open' _qemu_img_wrapper info TEST_DIR/t.qcow2 -qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock -Is another process using the image? +qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock. Is another process using the image? _qemu_img_wrapper check TEST_DIR/t.qcow2 -qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock -Is another process using the image? +qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock. Is another process using the image? _qemu_img_wrapper compare TEST_DIR/t.qcow2 TEST_DIR/t.qcow2 -qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock -Is another process using the image? +qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock. Is another p
Re: [Qemu-devel] [PATCH v2 03/20] 9p: xattr: Fix crash due to free of uninitialized value
On Thu, 31 May 2018 21:25:58 -0400 Keno Fischer wrote: > If the size returned from llistxattr is 0, we skipped the malloc > call, leaving xattr.value uninitialized. However, this value is > later passed to `g_free` without any further checks, causing an Ouch, good catch. > error. Fix that by always calling g_malloc unconditionally. If > `size` is 0, it will return a pointer that is safe to pass to g_free, > likely NULL. > "Allocates n_bytes bytes of memory, initialized to 0's. If n_bytes is 0 it returns NULL." https://developer.gnome.org/glib/unstable/glib-Memory-Allocation.html#g-malloc The fix is good, but it seems the same can also happen if v9fs_co_lgetxattr() returns 0 a few lines below. Can you check this out and fix it if needed ? > Signed-off-by: Keno Fischer > --- > > Changes since v1: New patch > > hw/9pfs/9p.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > index d74302d..b80db65 100644 > --- a/hw/9pfs/9p.c > +++ b/hw/9pfs/9p.c > @@ -3256,8 +3256,8 @@ static void coroutine_fn v9fs_xattrwalk(void *opaque) > xattr_fidp->fs.xattr.len = size; > xattr_fidp->fid_type = P9_FID_XATTR; > xattr_fidp->fs.xattr.xattrwalk_fid = true; > +xattr_fidp->fs.xattr.value = g_malloc0(size); > if (size) { > -xattr_fidp->fs.xattr.value = g_malloc0(size); > err = v9fs_co_llistxattr(pdu, &xattr_fidp->path, > xattr_fidp->fs.xattr.value, > xattr_fidp->fs.xattr.len);
Re: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading
On Thu, 05/31 23:45, no-re...@patchew.org wrote: > Hi, > > This series failed build test on s390x host. Please find the details below. > > Type: series > Message-id: 20180601062849.28641-1-f...@redhat.com > Subject: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading > > === TEST SCRIPT BEGIN === > #!/bin/bash > # Testing script will be invoked under the git checkout with > # HEAD pointing to a commit that has the patches applied on top of "base" > # branch > set -e > echo "=== ENV ===" > env > echo "=== PACKAGES ===" > rpm -qa > echo "=== TEST BEGIN ===" > CC=$HOME/bin/cc > INSTALL=$PWD/install > BUILD=$PWD/build > echo -n "Using CC: " > realpath $CC > mkdir -p $BUILD $INSTALL > SRC=$PWD > cd $BUILD > $SRC/configure --cc=$CC --prefix=$INSTALL > make -j4 > # XXX: we need reliable clean up > # make check -j4 V=1 > make install > === TEST SCRIPT END === > > Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 > From https://github.com/patchew-project/qemu > * [new tag] patchew/20180601062849.28641-1-f...@redhat.com -> > patchew/20180601062849.28641-1-f...@redhat.com > Switched to a new branch 'test' > 5c03bb24cc qemu-img: Convert with copy offloading > 51e732e62a block-backend: Add blk_co_copy_range > 92edadadac iscsi: Implement copy offloading > 6f8d57827a iscsi: Create and use iscsi_co_wait_for_task > 5940e7ebff iscsi: Query and save device designator when opening > 7619b4045d file-posix: Implement bdrv_co_copy_range > af0dbb042d qcow2: Implement copy offloading > d1550d576c raw: Implement copy offloading > 61c824c950 raw: Check byte range uniformly > 35c506afdd block: Introduce API for copy offloading > b5683be886 docker: Update fedora image to 28 > > === OUTPUT BEGIN === > === ENV === > LANG=en_US.UTF-8 > XDG_SESSION_ID=212219 > USER=fam > PWD=/var/tmp/patchew-tester-tmp-2l7s8dte/src > HOME=/home/fam > SHELL=/bin/sh > SHLVL=2 > PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug > LOGNAME=fam > DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus > XDG_RUNTIME_DIR=/run/user/1012 > PATH=/usr/bin:/bin > _=/usr/bin/env > === PACKAGES === > gpg-pubkey-873529b8-54e386ff > glibc-debuginfo-common-2.24-10.fc25.s390x > fedora-release-26-1.noarch > dejavu-sans-mono-fonts-2.35-4.fc26.noarch > xemacs-filesystem-21.5.34-22.20170124hgf412e9f093d4.fc26.noarch > bash-4.4.12-7.fc26.s390x > libSM-1.2.2-5.fc26.s390x > libmpc-1.0.2-6.fc26.s390x > libaio-0.3.110-7.fc26.s390x > libverto-0.2.6-7.fc26.s390x > perl-Scalar-List-Utils-1.48-1.fc26.s390x > iptables-libs-1.6.1-2.fc26.s390x > tcl-8.6.6-2.fc26.s390x > libxshmfence-1.2-4.fc26.s390x > expect-5.45-23.fc26.s390x > perl-Thread-Queue-3.12-1.fc26.noarch > perl-encoding-2.19-6.fc26.s390x > keyutils-1.5.10-1.fc26.s390x > gmp-devel-6.1.2-4.fc26.s390x > enchant-1.6.0-16.fc26.s390x > python-gobject-base-3.24.1-1.fc26.s390x > python3-enchant-1.6.10-1.fc26.noarch > python-lockfile-0.11.0-6.fc26.noarch > python2-pyparsing-2.1.10-3.fc26.noarch > python2-lxml-4.1.1-1.fc26.s390x > librados2-10.2.7-2.fc26.s390x > trousers-lib-0.3.13-7.fc26.s390x > libdatrie-0.2.9-4.fc26.s390x > libsoup-2.58.2-1.fc26.s390x > passwd-0.79-9.fc26.s390x > bind99-libs-9.9.10-3.P3.fc26.s390x > python3-rpm-4.13.0.2-1.fc26.s390x > systemd-233-7.fc26.s390x > virglrenderer-0.6.0-1.20170210git76b3da97b.fc26.s390x > s390utils-ziomon-1.36.1-3.fc26.s390x > s390utils-osasnmpd-1.36.1-3.fc26.s390x > libXrandr-1.5.1-2.fc26.s390x > libglvnd-glx-1.0.0-1.fc26.s390x > texlive-ifxetex-svn19685.0.5-33.fc26.2.noarch > texlive-psnfss-svn33946.9.2a-33.fc26.2.noarch > texlive-dvipdfmx-def-svn40328-33.fc26.2.noarch > texlive-natbib-svn20668.8.31b-33.fc26.2.noarch > texlive-xdvi-bin-svn40750-33.20160520.fc26.2.s390x > texlive-cm-svn32865.0-33.fc26.2.noarch > texlive-beton-svn15878.0-33.fc26.2.noarch > texlive-fpl-svn15878.1.002-33.fc26.2.noarch > texlive-mflogo-svn38628-33.fc26.2.noarch > texlive-texlive-docindex-svn41430-33.fc26.2.noarch > texlive-luaotfload-bin-svn34647.0-33.20160520.fc26.2.noarch > texlive-koma-script-svn41508-33.fc26.2.noarch > texlive-pst-tree-svn24142.1.12-33.fc26.2.noarch > texlive-breqn-svn38099.0.98d-33.fc26.2.noarch > texlive-xetex-svn41438-33.fc26.2.noarch > gstreamer1-plugins-bad-free-1.12.3-1.fc26.s390x > xorg-x11-font-utils-7.5-33.fc26.s390x > ghostscript-fonts-5.50-36.fc26.noarch > libXext-devel-1.3.3-5.fc26.s390x > libusbx-devel-1.0.21-2.fc26.s390x > libglvnd-devel-1.0.0-1.fc26.s390x > emacs-25.3-3.fc26.s390x > alsa-lib-devel-1.1.4.1-1.fc26.s390x > kbd-2.0.4-2.fc26.s390x > dconf-0.26.0-2.fc26.s390x > mc-4.8.19-5.fc26.s390x > doxygen-1.8.13-9.fc26.s390x > dpkg-1.18.24-1.fc26.s390x > libtdb-1.3.13-1.fc26.s390x > python2-pynacl-1.1.1-1.fc26.s390x > perl-Filter-1.58-1.fc26.s390x > python2-pip-9.0.1-11.fc26.noarch > dnf-2.7.5-2.fc26.noarch > bind-license-9.11.2-1.P1.fc26.noarch > libtasn1-4.13-1.fc26.s390x > cpp-7.3.1-2.fc26.s390x > pkgconf-1.3.12-2.fc26.s390x > python2-fedora-0.10.0-1.fc26.noarch > cmake-filesystem-3.10.1-11.fc26.s390x > python3-requests
Re: [Qemu-devel] Questions about the flow of interrupt simulation
On 1 June 2018 at 07:17, Eva Chen wrote: > 1. There are two kinds of interrupt: edge triggered and level triggered. > I have seen two code segment related to the level: gic_set_irq() and > arm_cpu_set_irq(). > In gic_set_irq(), if level == GIC_TEST_LEVEL(irq, cm), which means the > level is not changed, will return. > In arm_cpu_set_irq() said that if level ==1, call cpu_interrupt(). if > level==0, call cpu_reset_interrupt(), which will clean up that irq bits.. > Does that mean all interrupt in arm are level triggered(high level)? > How to know the triggered type of interrupt? This is mixing up interrupts in two different places. For the Arm architecture, IRQ and FIQ are always level-sensitive: the thing which sets them (the GIC, typically) has to set them and keep them set until the CPU acknowledges them. For the GIC, its input interrupts may be either level sensitive or edge sensitive. This is configurable for each interrupt on GICv2 by writing to the GICD_ICFGRn registers. The gic_set_irq() code implements the behaviour that the GIC specification requires, depending on whether the ICFGRn register says that interrupt should be edge or level triggered. Other interrupt controllers that QEMU models may behave differently. (For instance the ARMv7M NVIC is different again.) > 2. interrupt signal will be passed through GIC from device to CPU. There > are four types of interrupt in CPU: CPU_INTERRUPT_HARD/FIQ/VIRQ/VFIQ. > Where exactly define the CPU_INTERRUPT_{type} that device's interrupt > corresponded? Again, this is configurable by the guest by writing to GIC registers. In the GICv2, the GICD_IGROUPRn registers set the whether the interrupt should be in "group 0" or "group 1". Group 1 interrupts always cause an IRQ; group 0 interrupts cause either IRQ or FIQ depending on the setting of the GICC_CTLR FIQEn bit. (The expected use is that interrupts configured for use with the TrustZone Secure World will use FIQ and those configured for use with the NonSecure World will use IRQ.) VIRQ and VFIQ are for when the GIC and CPU support the Virtualization Extension. The behaviour of all of this is defined by the GIC specification; QEMU just has to implement what the hardware does. > 3. I have seen others device's code under qemu/hw directory. Almost all > device will call qemu_set_irq() at the end of device's read/write. Is that > for the purpose of a device to tell CPU that it has done some works? > but the second parameter of qemu_set_irq(), level, will be set to a > different value(not always 1 or 0), which sometimes will cause the > interrupt return at gic_set_irq() instead of passing to CPU. > What does the interrupt at the end of device_read/write(device_update()) > mean? The best way to think of this is not to try to think about whether the interrupt line is connected to the CPU or anything else. Just think about a device model as being an emulation of a particular bit of hardware. For instance, take the pl011 UART. The specification for that UART says that when certain conditions inside the device are true, the UART will assert its outgoing interrupt line. So our model also must check those conditions and call qemu_set_irq() to raise and lower the interrupt at the right time. The common way to code this is to have a function which is called whenever any of the relevant state has changed, which rechecks the conditions and calls qemu_set_irq(). The "purpose" of this code is just to behave the way the hardware behaves. Commonly, the output IRQ line from a device is connected to an interrupt controller and thus to a CPU, but it doesn't have to be. On some boards, the IRQ line might not be connected to anything. Or it might be connected up to some other device which provides its value to the guest via a status register. Or perhaps it's ORed together with lines from other devices and the output of the OR gate goes to the interrupt controller. If you're designing a board in real hardware, you are taking various components (UART, interrupt controller, etc) and connecting them up to implement a useful design. In QEMU, we also take various components and connect them up, to produce the same design the hardware has. qemu_set_irq() is usually just modelling what in the hardware is a simple wire. The source end calls this function to say "the wire is at logical 0/1", and on the destination end a function is called to handle that. It is also possible to pass in a value other than 0 or 1. This happens in two cases: (1) a bug, where the source end really ought to be using 0 or 1; often this doesn't have any visible bad effects because the destination end is testing 0 vs not-0, rather than 0 vs 1 (2) we really do want to transfer an integer rather than just a 0-vs-1 level. This is less common and only happens when both ends of the "wire" know that that's the convention they want to use. I think the overall theme of my reply is that fundamentally QEMU is modelling hardware. If you want to understand why parts
[Qemu-devel] [PATCH v9 01/10] block: Introduce API for copy offloading
Introduce the bdrv_co_copy_range() API for copy offloading. Block drivers implementing this API support efficient copy operations that avoid reading each block from the source device and writing it to the destination devices. Examples of copy offload primitives are SCSI EXTENDED COPY and Linux copy_file_range(2). Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi --- block/io.c| 97 +++ include/block/block.h | 32 + include/block/block_int.h | 38 +++ 3 files changed, 167 insertions(+) diff --git a/block/io.c b/block/io.c index ca96b487eb..b7beaeeb9f 100644 --- a/block/io.c +++ b/block/io.c @@ -2835,3 +2835,100 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host) bdrv_unregister_buf(child->bs, host); } } + +static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, +uint64_t src_offset, +BdrvChild *dst, +uint64_t dst_offset, +uint64_t bytes, +BdrvRequestFlags flags, +bool recurse_src) +{ +int ret; + +if (!src || !dst || !src->bs || !dst->bs) { +return -ENOMEDIUM; +} +ret = bdrv_check_byte_request(src->bs, src_offset, bytes); +if (ret) { +return ret; +} + +ret = bdrv_check_byte_request(dst->bs, dst_offset, bytes); +if (ret) { +return ret; +} +if (flags & BDRV_REQ_ZERO_WRITE) { +return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags); +} + +if (!src->bs->drv->bdrv_co_copy_range_from +|| !dst->bs->drv->bdrv_co_copy_range_to +|| src->bs->encrypted || dst->bs->encrypted) { +return -ENOTSUP; +} +if (recurse_src) { +return src->bs->drv->bdrv_co_copy_range_from(src->bs, + src, src_offset, + dst, dst_offset, + bytes, flags); +} else { +return dst->bs->drv->bdrv_co_copy_range_to(dst->bs, + src, src_offset, + dst, dst_offset, + bytes, flags); +} +} + +/* Copy range from @src to @dst. + * + * See the comment of bdrv_co_copy_range for the parameter and return value + * semantics. */ +int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ +return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, + bytes, flags, true); +} + +/* Copy range from @src to @dst. + * + * See the comment of bdrv_co_copy_range for the parameter and return value + * semantics. */ +int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ +return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, + bytes, flags, false); +} + +int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset, +BdrvChild *dst, uint64_t dst_offset, +uint64_t bytes, BdrvRequestFlags flags) +{ +BdrvTrackedRequest src_req, dst_req; +BlockDriverState *src_bs = src->bs; +BlockDriverState *dst_bs = dst->bs; +int ret; + +bdrv_inc_in_flight(src_bs); +bdrv_inc_in_flight(dst_bs); +tracked_request_begin(&src_req, src_bs, src_offset, + bytes, BDRV_TRACKED_READ); +tracked_request_begin(&dst_req, dst_bs, dst_offset, + bytes, BDRV_TRACKED_WRITE); + +wait_serialising_requests(&src_req); +wait_serialising_requests(&dst_req); +ret = bdrv_co_copy_range_from(src, src_offset, + dst, dst_offset, + bytes, flags); + +tracked_request_end(&src_req); +tracked_request_end(&dst_req); +bdrv_dec_in_flight(src_bs); +bdrv_dec_in_flight(dst_bs); +return ret; +} diff --git a/include/block/block.h b/include/block/block.h index 3894edda9d..6cc6c7e699 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -611,4 +611,36 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name, */ void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size); void bdrv_unregister_buf(BlockDriver
[Qemu-devel] [PATCH v9 00/10] qemu-img convert with copy offloading
v9: Don't break older libiscsi. [patchew] v8: Fix compiling against new glibc and libiscsi on Fedora 28 where v7 had conflict definitions. [Stefan, myself] - Add HAVE_COPY_FILE_RANGE in configure. - Drop IDENT_DESCR_TGT_DESCR from scsi constants header. v7: Fix qcow2. v6: Pick up rev-by from Stefan and Eric. Tweak patch 2 commit message. v5: - Fix raw offset/bytes check for read. [Eric] - Fix qcow2_handle_l2meta. [Stefan] - Add coroutine_fn whereever appropriate. [Stefan] v4: - Fix raw offset and size. [Eric] - iscsi: Drop unnecessary return values and variables in favor of constants. [Stefan] - qcow2: Handle small backing case. [Stefan] - file-posix: Translate ENOSYS to ENOTSUP. [Stefan] - API documentation and commit message. [Stefan] - Add rev-by to patches 3, 5 - 10. [Stefan, Eric] This series introduces block layer API for copy offloading and makes use of it in qemu-img convert. For now we implemented the operation in local file protocol with copy_file_range(2). Besides that it's possible to add similar to iscsi, nfs and potentially more. As far as its usage goes, in addition to qemu-img convert, we can emulate offloading in scsi-disk (handle EXTENDED COPY command), and use the API in block jobs too. Fam Zheng (10): block: Introduce API for copy offloading raw: Check byte range uniformly raw: Implement copy offloading qcow2: Implement copy offloading file-posix: Implement bdrv_co_copy_range iscsi: Query and save device designator when opening iscsi: Create and use iscsi_co_wait_for_task iscsi: Implement copy offloading block-backend: Add blk_co_copy_range qemu-img: Convert with copy offloading block/block-backend.c | 18 ++ block/file-posix.c | 98 +- block/io.c | 97 ++ block/iscsi.c | 314 + block/qcow2.c | 229 block/raw-format.c | 96 +++--- configure | 17 ++ include/block/block.h | 32 include/block/block_int.h | 38 include/block/raw-aio.h| 10 +- include/scsi/constants.h | 4 + include/sysemu/block-backend.h | 4 + qemu-img.c | 50 +- 13 files changed, 908 insertions(+), 99 deletions(-) -- 2.17.0
[Qemu-devel] [PATCH v9 02/10] raw: Check byte range uniformly
We don't verify the request range against s->size in the I/O callbacks except for raw_co_pwritev. This is inconsistent (especially for raw_co_pwrite_zeroes and raw_co_pdiscard), so fix them, in the meanwhile make the helper reusable by the coming new callbacks. Note that in most cases the block layer already verifies the request byte range against our reported image length, before invoking the driver callbacks. The exception is during image creating, after blk_set_allow_write_beyond_eof(blk, true) is called. But in that case, the requests are not directly from the user or guest. So there is no visible behavior change in adding the check code. The int64_t -> uint64_t inconsistency, as shown by the type casting, is pre-existing due to the interface. Reviewed-by: Stefan Hajnoczi Reviewed-by: Eric Blake Signed-off-by: Fam Zheng --- block/raw-format.c | 64 -- 1 file changed, 39 insertions(+), 25 deletions(-) diff --git a/block/raw-format.c b/block/raw-format.c index fe33693a2d..b69a0674b3 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -167,16 +167,37 @@ static void raw_reopen_abort(BDRVReopenState *state) state->opaque = NULL; } +/* Check and adjust the offset, against 'offset' and 'size' options. */ +static inline int raw_adjust_offset(BlockDriverState *bs, uint64_t *offset, +uint64_t bytes, bool is_write) +{ +BDRVRawState *s = bs->opaque; + +if (s->has_size && (*offset > s->size || bytes > (s->size - *offset))) { +/* There's not enough space for the write, or the read request is + * out-of-range. Don't read/write anything to prevent leaking out of + * the size specified in options. */ +return is_write ? -ENOSPC : -EINVAL;; +} + +if (*offset > INT64_MAX - s->offset) { +return -EINVAL; +} +*offset += s->offset; + +return 0; +} + static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) { -BDRVRawState *s = bs->opaque; +int ret; -if (offset > UINT64_MAX - s->offset) { -return -EINVAL; +ret = raw_adjust_offset(bs, &offset, bytes, false); +if (ret) { +return ret; } -offset += s->offset; BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO); return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags); @@ -186,23 +207,11 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) { -BDRVRawState *s = bs->opaque; void *buf = NULL; BlockDriver *drv; QEMUIOVector local_qiov; int ret; -if (s->has_size && (offset > s->size || bytes > (s->size - offset))) { -/* There's not enough space for the data. Don't write anything and just - * fail to prevent leaking out of the size specified in options. */ -return -ENOSPC; -} - -if (offset > UINT64_MAX - s->offset) { -ret = -EINVAL; -goto fail; -} - if (bs->probed && offset < BLOCK_PROBE_BUF_SIZE && bytes) { /* Handling partial writes would be a pain - so we just * require that guests have 512-byte request alignment if @@ -237,7 +246,10 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset, qiov = &local_qiov; } -offset += s->offset; +ret = raw_adjust_offset(bs, &offset, bytes, true); +if (ret) { +goto fail; +} BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO); ret = bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags); @@ -267,22 +279,24 @@ static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int bytes, BdrvRequestFlags flags) { -BDRVRawState *s = bs->opaque; -if (offset > UINT64_MAX - s->offset) { -return -EINVAL; +int ret; + +ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true); +if (ret) { +return ret; } -offset += s->offset; return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags); } static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes) { -BDRVRawState *s = bs->opaque; -if (offset > UINT64_MAX - s->offset) { -return -EINVAL; +int ret; + +ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true); +if (ret) { +return ret; } -offset += s->offset; return bdrv_co_pdiscard(bs->file->bs, offset, bytes); } -- 2.17.0
[Qemu-devel] [PATCH v9 05/10] file-posix: Implement bdrv_co_copy_range
With copy_file_range(2), we can implement the bdrv_co_copy_range semantics. Signed-off-by: Fam Zheng --- block/file-posix.c | 98 +++-- configure | 17 +++ include/block/raw-aio.h | 10 - 3 files changed, 120 insertions(+), 5 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 5a602cfe37..513d371bb1 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -59,6 +59,7 @@ #ifdef __linux__ #include #include +#include #include #include #include @@ -187,6 +188,8 @@ typedef struct RawPosixAIOData { #define aio_ioctl_cmd aio_nbytes /* for QEMU_AIO_IOCTL */ off_t aio_offset; int aio_type; +int aio_fd2; +off_t aio_offset2; } RawPosixAIOData; #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) @@ -1446,6 +1449,49 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb) return -ENOTSUP; } +#ifndef HAVE_COPY_FILE_RANGE +static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd, + off_t *out_off, size_t len, unsigned int flags) +{ +#ifdef __NR_copy_file_range +return syscall(__NR_copy_file_range, in_fd, in_off, out_fd, + out_off, len, flags); +#else +errno = ENOSYS; +return -1; +#endif +} +#endif + +static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb) +{ +uint64_t bytes = aiocb->aio_nbytes; +off_t in_off = aiocb->aio_offset; +off_t out_off = aiocb->aio_offset2; + +while (bytes) { +ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off, + aiocb->aio_fd2, &out_off, + bytes, 0); +if (ret == -EINTR) { +continue; +} +if (ret < 0) { +if (errno == ENOSYS) { +return -ENOTSUP; +} else { +return -errno; +} +} +if (!ret) { +/* No progress (e.g. when beyond EOF), fall back to buffer I/O. */ +return -ENOTSUP; +} +bytes -= ret; +} +return 0; +} + static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb) { int ret = -EOPNOTSUPP; @@ -1526,6 +1572,9 @@ static int aio_worker(void *arg) case QEMU_AIO_WRITE_ZEROES: ret = handle_aiocb_write_zeroes(aiocb); break; +case QEMU_AIO_COPY_RANGE: +ret = handle_aiocb_copy_range(aiocb); +break; default: fprintf(stderr, "invalid aio request (0x%x)\n", aiocb->aio_type); ret = -EINVAL; @@ -1536,9 +1585,10 @@ static int aio_worker(void *arg) return ret; } -static int paio_submit_co(BlockDriverState *bs, int fd, - int64_t offset, QEMUIOVector *qiov, - int bytes, int type) +static int paio_submit_co_full(BlockDriverState *bs, int fd, + int64_t offset, int fd2, int64_t offset2, + QEMUIOVector *qiov, + int bytes, int type) { RawPosixAIOData *acb = g_new(RawPosixAIOData, 1); ThreadPool *pool; @@ -1546,6 +1596,8 @@ static int paio_submit_co(BlockDriverState *bs, int fd, acb->bs = bs; acb->aio_type = type; acb->aio_fildes = fd; +acb->aio_fd2 = fd2; +acb->aio_offset2 = offset2; acb->aio_nbytes = bytes; acb->aio_offset = offset; @@ -1561,6 +1613,13 @@ static int paio_submit_co(BlockDriverState *bs, int fd, return thread_pool_submit_co(pool, aio_worker, acb); } +static inline int paio_submit_co(BlockDriverState *bs, int fd, + int64_t offset, QEMUIOVector *qiov, + int bytes, int type) +{ +return paio_submit_co_full(bs, fd, offset, -1, 0, qiov, bytes, type); +} + static BlockAIOCB *paio_submit(BlockDriverState *bs, int fd, int64_t offset, QEMUIOVector *qiov, int bytes, BlockCompletionFunc *cb, void *opaque, int type) @@ -2451,6 +2510,35 @@ static void raw_abort_perm_update(BlockDriverState *bs) raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL); } +static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs, + BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ +return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags); +} + +static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs, + BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ +BDRVRawState *s = bs->opaque; +BDRVRawState *src_s; + +assert(dst->bs =
[Qemu-devel] [PATCH v9 03/10] raw: Implement copy offloading
Just pass down to ->file. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi --- block/raw-format.c | 32 1 file changed, 32 insertions(+) diff --git a/block/raw-format.c b/block/raw-format.c index b69a0674b3..f2e468df6f 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -497,6 +497,36 @@ static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo) return bdrv_probe_geometry(bs->file->bs, geo); } +static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs, + BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ +int ret; + +ret = raw_adjust_offset(bs, &src_offset, bytes, false); +if (ret) { +return ret; +} +return bdrv_co_copy_range_from(bs->file, src_offset, dst, dst_offset, + bytes, flags); +} + +static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs, + BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ +int ret; + +ret = raw_adjust_offset(bs, &dst_offset, bytes, true); +if (ret) { +return ret; +} +return bdrv_co_copy_range_to(src, src_offset, bs->file, dst_offset, bytes, + flags); +} + BlockDriver bdrv_raw = { .format_name = "raw", .instance_size= sizeof(BDRVRawState), @@ -513,6 +543,8 @@ BlockDriver bdrv_raw = { .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes, .bdrv_co_pdiscard = &raw_co_pdiscard, .bdrv_co_block_status = &raw_co_block_status, +.bdrv_co_copy_range_from = &raw_co_copy_range_from, +.bdrv_co_copy_range_to = &raw_co_copy_range_to, .bdrv_truncate= &raw_truncate, .bdrv_getlength = &raw_getlength, .has_variable_length = true, -- 2.17.0
[Qemu-devel] [PATCH v9 06/10] iscsi: Query and save device designator when opening
The device designator data returned in INQUIRY command will be useful to fill in source/target fields during copy offloading. Do this when connecting to the target and save the data for later use. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi --- block/iscsi.c | 41 + 1 file changed, 41 insertions(+) diff --git a/block/iscsi.c b/block/iscsi.c index 3fd7203916..6d0035d4b9 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -68,6 +68,7 @@ typedef struct IscsiLun { QemuMutex mutex; struct scsi_inquiry_logical_block_provisioning lbp; struct scsi_inquiry_block_limits bl; +struct scsi_inquiry_device_designator *dd; unsigned char *zeroblock; /* The allocmap tracks which clusters (pages) on the iSCSI target are * allocated and which are not. In case a target returns zeros for @@ -1740,6 +1741,30 @@ static QemuOptsList runtime_opts = { }, }; +static void iscsi_save_designator(IscsiLun *lun, + struct scsi_inquiry_device_identification *inq_di) +{ +struct scsi_inquiry_device_designator *desig, *copy = NULL; + +for (desig = inq_di->designators; desig; desig = desig->next) { +if (desig->association || +desig->designator_type > SCSI_DESIGNATOR_TYPE_NAA) { +continue; +} +/* NAA works better than T10 vendor ID based designator. */ +if (!copy || copy->designator_type < desig->designator_type) { +copy = desig; +} +} +if (copy) { +lun->dd = g_new(struct scsi_inquiry_device_designator, 1); +*lun->dd = *copy; +lun->dd->next = NULL; +lun->dd->designator = g_malloc(copy->designator_length); +memcpy(lun->dd->designator, copy->designator, copy->designator_length); +} +} + static int iscsi_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { @@ -1922,6 +1947,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags, struct scsi_task *inq_task; struct scsi_inquiry_logical_block_provisioning *inq_lbp; struct scsi_inquiry_block_limits *inq_bl; +struct scsi_inquiry_device_identification *inq_di; switch (inq_vpd->pages[i]) { case SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING: inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1, @@ -1947,6 +1973,17 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags, sizeof(struct scsi_inquiry_block_limits)); scsi_free_scsi_task(inq_task); break; +case SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION: +inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1, + SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION, +(void **) &inq_di, errp); +if (inq_task == NULL) { +ret = -EINVAL; +goto out; +} +iscsi_save_designator(iscsilun, inq_di); +scsi_free_scsi_task(inq_task); +break; default: break; } @@ -2003,6 +2040,10 @@ static void iscsi_close(BlockDriverState *bs) iscsi_logout_sync(iscsi); } iscsi_destroy_context(iscsi); +if (iscsilun->dd) { +g_free(iscsilun->dd->designator); +g_free(iscsilun->dd); +} g_free(iscsilun->zeroblock); iscsi_allocmap_free(iscsilun); qemu_mutex_destroy(&iscsilun->mutex); -- 2.17.0
[Qemu-devel] [PATCH v9 09/10] block-backend: Add blk_co_copy_range
It's a BlockBackend wrapper of the BDS interface. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi --- block/block-backend.c | 18 ++ include/sysemu/block-backend.h | 4 2 files changed, 22 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 89f47b00ea..d55c328736 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2211,3 +2211,21 @@ void blk_unregister_buf(BlockBackend *blk, void *host) { bdrv_unregister_buf(blk_bs(blk), host); } + +int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, + BlockBackend *blk_out, int64_t off_out, + int bytes, BdrvRequestFlags flags) +{ +int r; +r = blk_check_byte_request(blk_in, off_in, bytes); +if (r) { +return r; +} +r = blk_check_byte_request(blk_out, off_out, bytes); +if (r) { +return r; +} +return bdrv_co_copy_range(blk_in->root, off_in, + blk_out->root, off_out, + bytes, flags); +} diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 92ab624fac..8d03d493c2 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -232,4 +232,8 @@ void blk_set_force_allow_inactivate(BlockBackend *blk); void blk_register_buf(BlockBackend *blk, void *host, size_t size); void blk_unregister_buf(BlockBackend *blk, void *host); +int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, + BlockBackend *blk_out, int64_t off_out, + int bytes, BdrvRequestFlags flags); + #endif -- 2.17.0
[Qemu-devel] [PATCH v9 04/10] qcow2: Implement copy offloading
The two callbacks are implemented quite similarly to the read/write functions: bdrv_co_copy_range_from maps for read and calls into bs->file or bs->backing depending on the allocation status; bdrv_co_copy_range_to maps for write and calls into bs->file. Reviewed-by: Stefan Hajnoczi Signed-off-by: Fam Zheng --- block/qcow2.c | 229 +++--- 1 file changed, 199 insertions(+), 30 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 6d532470a8..8f89c4fe72 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1762,6 +1762,39 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs, return status; } +static coroutine_fn int qcow2_handle_l2meta(BlockDriverState *bs, +QCowL2Meta **pl2meta, +bool link_l2) +{ +int ret = 0; +QCowL2Meta *l2meta = *pl2meta; + +while (l2meta != NULL) { +QCowL2Meta *next; + +if (!ret && link_l2) { +ret = qcow2_alloc_cluster_link_l2(bs, l2meta); +if (ret) { +goto out; +} +} + +/* Take the request off the list of running requests */ +if (l2meta->nb_clusters != 0) { +QLIST_REMOVE(l2meta, next_in_flight); +} + +qemu_co_queue_restart_all(&l2meta->dependent_requests); + +next = l2meta->next; +g_free(l2meta); +l2meta = next; +} +out: +*pl2meta = l2meta; +return ret; +} + static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) @@ -2048,24 +2081,9 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, } } -while (l2meta != NULL) { -QCowL2Meta *next; - -ret = qcow2_alloc_cluster_link_l2(bs, l2meta); -if (ret < 0) { -goto fail; -} - -/* Take the request off the list of running requests */ -if (l2meta->nb_clusters != 0) { -QLIST_REMOVE(l2meta, next_in_flight); -} - -qemu_co_queue_restart_all(&l2meta->dependent_requests); - -next = l2meta->next; -g_free(l2meta); -l2meta = next; +ret = qcow2_handle_l2meta(bs, &l2meta, true); +if (ret) { +goto fail; } bytes -= cur_bytes; @@ -2076,18 +2094,7 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, ret = 0; fail: -while (l2meta != NULL) { -QCowL2Meta *next; - -if (l2meta->nb_clusters != 0) { -QLIST_REMOVE(l2meta, next_in_flight); -} -qemu_co_queue_restart_all(&l2meta->dependent_requests); - -next = l2meta->next; -g_free(l2meta); -l2meta = next; -} +qcow2_handle_l2meta(bs, &l2meta, false); qemu_co_mutex_unlock(&s->lock); @@ -3274,6 +3281,166 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs, return ret; } +static int coroutine_fn +qcow2_co_copy_range_from(BlockDriverState *bs, + BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ +BDRVQcow2State *s = bs->opaque; +int ret; +unsigned int cur_bytes; /* number of bytes in current iteration */ +BdrvChild *child = NULL; +BdrvRequestFlags cur_flags; + +assert(!bs->encrypted); +qemu_co_mutex_lock(&s->lock); + +while (bytes != 0) { +uint64_t copy_offset = 0; +/* prepare next request */ +cur_bytes = MIN(bytes, INT_MAX); +cur_flags = flags; + +ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, ©_offset); +if (ret < 0) { +goto out; +} + +switch (ret) { +case QCOW2_CLUSTER_UNALLOCATED: +if (bs->backing && bs->backing->bs) { +int64_t backing_length = bdrv_getlength(bs->backing->bs); +if (src_offset >= backing_length) { +cur_flags |= BDRV_REQ_ZERO_WRITE; +} else { +child = bs->backing; +cur_bytes = MIN(cur_bytes, backing_length - src_offset); +copy_offset = src_offset; +} +} else { +cur_flags |= BDRV_REQ_ZERO_WRITE; +} +break; + +case QCOW2_CLUSTER_ZERO_PLAIN: +case QCOW2_CLUSTER_ZERO_ALLOC: +cur_flags |= BDRV_REQ_ZERO_WRITE; +break; + +case QCOW2_CLUSTER_COMPRESSED: +ret = -ENOTSUP; +goto out; +break; + +case QCOW2_CLUSTER_NORMAL: +child = bs->file; +c
[Qemu-devel] [PATCH v9 07/10] iscsi: Create and use iscsi_co_wait_for_task
This loop is repeated a growing number times. Make a helper. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- block/iscsi.c | 54 --- 1 file changed, 17 insertions(+), 37 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 6d0035d4b9..6a365cb07b 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -556,6 +556,17 @@ static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun, offset / iscsilun->cluster_size) == size); } +static void coroutine_fn iscsi_co_wait_for_task(IscsiTask *iTask, +IscsiLun *iscsilun) +{ +while (!iTask->complete) { +iscsi_set_events(iscsilun); +qemu_mutex_unlock(&iscsilun->mutex); +qemu_coroutine_yield(); +qemu_mutex_lock(&iscsilun->mutex); +} +} + static int coroutine_fn iscsi_co_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *iov, int flags) @@ -617,12 +628,7 @@ retry: scsi_task_set_iov_out(iTask.task, (struct scsi_iovec *) iov->iov, iov->niov); #endif -while (!iTask.complete) { -iscsi_set_events(iscsilun); -qemu_mutex_unlock(&iscsilun->mutex); -qemu_coroutine_yield(); -qemu_mutex_lock(&iscsilun->mutex); -} +iscsi_co_wait_for_task(&iTask, iscsilun); if (iTask.task != NULL) { scsi_free_scsi_task(iTask.task); @@ -693,13 +699,7 @@ retry: ret = -ENOMEM; goto out_unlock; } - -while (!iTask.complete) { -iscsi_set_events(iscsilun); -qemu_mutex_unlock(&iscsilun->mutex); -qemu_coroutine_yield(); -qemu_mutex_lock(&iscsilun->mutex); -} +iscsi_co_wait_for_task(&iTask, iscsilun); if (iTask.do_retry) { if (iTask.task != NULL) { @@ -863,13 +863,8 @@ retry: #if LIBISCSI_API_VERSION < (20160603) scsi_task_set_iov_in(iTask.task, (struct scsi_iovec *) iov->iov, iov->niov); #endif -while (!iTask.complete) { -iscsi_set_events(iscsilun); -qemu_mutex_unlock(&iscsilun->mutex); -qemu_coroutine_yield(); -qemu_mutex_lock(&iscsilun->mutex); -} +iscsi_co_wait_for_task(&iTask, iscsilun); if (iTask.task != NULL) { scsi_free_scsi_task(iTask.task); iTask.task = NULL; @@ -906,12 +901,7 @@ retry: return -ENOMEM; } -while (!iTask.complete) { -iscsi_set_events(iscsilun); -qemu_mutex_unlock(&iscsilun->mutex); -qemu_coroutine_yield(); -qemu_mutex_lock(&iscsilun->mutex); -} +iscsi_co_wait_for_task(&iTask, iscsilun); if (iTask.task != NULL) { scsi_free_scsi_task(iTask.task); @@ -1143,12 +1133,7 @@ retry: goto out_unlock; } -while (!iTask.complete) { -iscsi_set_events(iscsilun); -qemu_mutex_unlock(&iscsilun->mutex); -qemu_coroutine_yield(); -qemu_mutex_lock(&iscsilun->mutex); -} +iscsi_co_wait_for_task(&iTask, iscsilun); if (iTask.task != NULL) { scsi_free_scsi_task(iTask.task); @@ -1244,12 +1229,7 @@ retry: return -ENOMEM; } -while (!iTask.complete) { -iscsi_set_events(iscsilun); -qemu_mutex_unlock(&iscsilun->mutex); -qemu_coroutine_yield(); -qemu_mutex_lock(&iscsilun->mutex); -} +iscsi_co_wait_for_task(&iTask, iscsilun); if (iTask.status == SCSI_STATUS_CHECK_CONDITION && iTask.task->sense.key == SCSI_SENSE_ILLEGAL_REQUEST && -- 2.17.0
Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Richard Henderson writes: > On 05/30/2018 03:46 PM, Richard Henderson wrote: >> Thanks. Queued to tcg-next. > Hmph. Unqueued, at least for now. > > ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug: > assertion failed: (page_is_locked(pd)) > > #3 0x74b6915e in g_assertion_message_expr () > at /lib64/libglib-2.0.so.0 > #4 0x5583c088 in page_unlock__debug (pd=0x7fffa423aa80) > at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615 > #5 0x5583c1be in page_unlock (pd=0x7fffa423aa80) > at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:661 > #6 0x5583c2ef in page_entry_destroy (p=0x7fffa8024460) > at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:694 > #7 0x74b6f448 in () at /lib64/libglib-2.0.so.0 > #8 0x74b6fea2 in g_tree_destroy () at /lib64/libglib-2.0.so.0 > #9 0x5583c791 in page_collection_unlock (set=0x7fffa802eba0) > at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:842 > #10 0x557b301a in memory_notdirty_write_complete (ndi=0x7fffd9cf6050) > at /home/rth/work/qemu/qemu/exec.c:2495 > #11 0x557b317f in notdirty_mem_write (opaque=0x0, ram_addr=12334096, > val=18446739675675374544, size=8) at /home/rth/work/qemu/qemu/exec.c:2535 > #12 0x5580f14b in memory_region_write_accessor (mr=0x562a38a0 > , addr=12334096, value=0x7fffd9cf6178, size=8, shift=0, > mask=18446744073709551615, attrs=...) at /home/rth/work/qemu/qemu/memory.c:530 > #13 0x5580f360 in access_with_adjusted_size (addr=12334096, > value=0x7fffd9cf6178, size=8, access_size_min=1, access_size_max=8, access_fn= > 0x5580f061 , mr=0x562a38a0 > , attrs=...) at /home/rth/work/qemu/qemu/memory.c:597 > #14 0x55811cef in memory_region_dispatch_write (mr=0x562a38a0 > , addr=12334096, data=18446739675675374544, size=8, > attrs=...) > at /home/rth/work/qemu/qemu/memory.c:1474 > #15 0x55825d73 in io_writex (env=0x56869090, > iotlbentry=0x56870520, mmu_idx=0, val=18446739675675374544, > addr=18446739675675374608, retaddr=140736231479305, size=8) at > /home/rth/work/qemu/qemu/accel/tcg/cputlb.c:813 > #16 0x55828b6d in io_writeq (env=0x56869090, mmu_idx=0, index=225, > val=18446739675675374544, addr=18446739675675374608, retaddr=140736231479305) > at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:265 > #17 0x55828d2c in helper_le_stq_mmu (env=0x56869090, > addr=18446739675675374608, val=18446739675675374544, oi=48, > retaddr=140736231479305) > at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:301 > #18 0x7fffb5159809 in code_gen_buffer () > > I can invoke similar crashes with just about every image I try. Just booting up? I've been hammering builds in my system image with debug-tcg enabled and haven't triggered it yet. Using: ./aarch64-softmmu/qemu-system-aarch64 -machine virt,graphics=on,gic-version=3,virtualization=on -cpu cortex-a53 --serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::-:22 -device virtio-blk-device,drive=myblock -drive file=/home/alex/lsrc/qemu/images/debian-stable-arm64.qcow2,id=myblock,index=0,if=none -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -append "console=ttyAMA0 root=/dev/vda1" -display none -m 4096 -name debug-threads=on -smp 8 -- Alex Bennée
[Qemu-devel] [PATCH v9 08/10] iscsi: Implement copy offloading
Issue EXTENDED COPY (LID1) command to implement the copy_range API. The parameter data construction code is modified from libiscsi's iscsi-dd.c. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi --- block/iscsi.c| 219 +++ include/scsi/constants.h | 4 + 2 files changed, 223 insertions(+) diff --git a/block/iscsi.c b/block/iscsi.c index 6a365cb07b..c2fbd8a8aa 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -2205,6 +2205,221 @@ static void coroutine_fn iscsi_co_invalidate_cache(BlockDriverState *bs, iscsi_allocmap_invalidate(iscsilun); } +static int coroutine_fn iscsi_co_copy_range_from(BlockDriverState *bs, + BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags flags) +{ +return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags); +} + +static struct scsi_task *iscsi_xcopy_task(int param_len) +{ +struct scsi_task *task; + +task = g_new0(struct scsi_task, 1); + +task->cdb[0] = EXTENDED_COPY; +task->cdb[10]= (param_len >> 24) & 0xFF; +task->cdb[11]= (param_len >> 16) & 0xFF; +task->cdb[12]= (param_len >> 8) & 0xFF; +task->cdb[13]= param_len & 0xFF; +task->cdb_size = 16; +task->xfer_dir = SCSI_XFER_WRITE; +task->expxferlen = param_len; + +return task; +} + +static void iscsi_populate_target_desc(unsigned char *desc, IscsiLun *lun) +{ +struct scsi_inquiry_device_designator *dd = lun->dd; + +memset(desc, 0, 32); +desc[0] = 0xE4; /* IDENT_DESCR_TGT_DESCR */ +desc[4] = dd->code_set; +desc[5] = (dd->designator_type & 0xF) +| ((dd->association & 3) << 4); +desc[7] = dd->designator_length; +memcpy(desc + 8, dd->designator, dd->designator_length); + +desc[28] = 0; +desc[29] = (lun->block_size >> 16) & 0xFF; +desc[30] = (lun->block_size >> 8) & 0xFF; +desc[31] = lun->block_size & 0xFF; +} + +static void iscsi_xcopy_desc_hdr(uint8_t *hdr, int dc, int cat, int src_index, + int dst_index) +{ +hdr[0] = 0x02; /* BLK_TO_BLK_SEG_DESCR */ +hdr[1] = ((dc << 1) | cat) & 0xFF; +hdr[2] = (XCOPY_BLK2BLK_SEG_DESC_SIZE >> 8) & 0xFF; +/* don't account for the first 4 bytes in descriptor header*/ +hdr[3] = (XCOPY_BLK2BLK_SEG_DESC_SIZE - 4 /* SEG_DESC_SRC_INDEX_OFFSET */) & 0xFF; +hdr[4] = (src_index >> 8) & 0xFF; +hdr[5] = src_index & 0xFF; +hdr[6] = (dst_index >> 8) & 0xFF; +hdr[7] = dst_index & 0xFF; +} + +static void iscsi_xcopy_populate_desc(uint8_t *desc, int dc, int cat, + int src_index, int dst_index, int num_blks, + uint64_t src_lba, uint64_t dst_lba) +{ +iscsi_xcopy_desc_hdr(desc, dc, cat, src_index, dst_index); + +/* The caller should verify the request size */ +assert(num_blks < 65536); +desc[10] = (num_blks >> 8) & 0xFF; +desc[11] = num_blks & 0xFF; +desc[12] = (src_lba >> 56) & 0xFF; +desc[13] = (src_lba >> 48) & 0xFF; +desc[14] = (src_lba >> 40) & 0xFF; +desc[15] = (src_lba >> 32) & 0xFF; +desc[16] = (src_lba >> 24) & 0xFF; +desc[17] = (src_lba >> 16) & 0xFF; +desc[18] = (src_lba >> 8) & 0xFF; +desc[19] = src_lba & 0xFF; +desc[20] = (dst_lba >> 56) & 0xFF; +desc[21] = (dst_lba >> 48) & 0xFF; +desc[22] = (dst_lba >> 40) & 0xFF; +desc[23] = (dst_lba >> 32) & 0xFF; +desc[24] = (dst_lba >> 24) & 0xFF; +desc[25] = (dst_lba >> 16) & 0xFF; +desc[26] = (dst_lba >> 8) & 0xFF; +desc[27] = dst_lba & 0xFF; +} + +static void iscsi_xcopy_populate_header(unsigned char *buf, int list_id, int str, +int list_id_usage, int prio, +int tgt_desc_len, +int seg_desc_len, int inline_data_len) +{ +buf[0] = list_id; +buf[1] = ((str & 1) << 5) | ((list_id_usage & 3) << 3) | (prio & 7); +buf[2] = (tgt_desc_len >> 8) & 0xFF; +buf[3] = tgt_desc_len & 0xFF; +buf[8] = (seg_desc_len >> 24) & 0xFF; +buf[9] = (seg_desc_len >> 16) & 0xFF; +buf[10] = (seg_desc_len >> 8) & 0xFF; +buf[11] = seg_desc_len & 0xFF; +buf[12] = (inline_data_len >> 24) & 0xFF; +buf[13] = (inline_data_len >> 16) & 0xFF; +buf[14] = (inline_data_len >> 8) & 0xFF; +buf[15] = inline_data_len & 0xFF; +} + +static void iscsi_xcopy_data(struct iscsi_data *data, + IscsiLun *src, int64_t src_lba, + IscsiLun *dst, int64_t dst_lba, + uint16_t num_blocks) +{ +uint8_
Re: [Qemu-devel] [PATCH v8 06/11] file-posix: Implement bdrv_co_copy_range
On Fri, Jun 01, 2018 at 02:28:44PM +0800, Fam Zheng wrote: > With copy_file_range(2), we can implement the bdrv_co_copy_range > semantics. > > Signed-off-by: Fam Zheng > --- > block/file-posix.c | 98 +++-- > configure | 17 +++ > include/block/raw-aio.h | 10 - > 3 files changed, 120 insertions(+), 5 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
[Qemu-devel] [PATCH v9 10/10] qemu-img: Convert with copy offloading
The new blk_co_copy_range interface offers a more efficient way in the case of network based storage. Make use of it to allow faster convert operation. Since copy offloading cannot do zero detection ('-S') and compression (-c), only try it when these options are not used. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi --- qemu-img.c | 50 -- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 976b437da0..75f1610aa0 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1547,6 +1547,7 @@ typedef struct ImgConvertState { bool compressed; bool target_has_backing; bool wr_in_order; +bool copy_range; int min_sparse; size_t cluster_sectors; size_t buf_sectors; @@ -1740,6 +1741,37 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num, return 0; } +static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t sector_num, + int nb_sectors) +{ +int n, ret; + +while (nb_sectors > 0) { +BlockBackend *blk; +int src_cur; +int64_t bs_sectors, src_cur_offset; +int64_t offset; + +convert_select_part(s, sector_num, &src_cur, &src_cur_offset); +offset = (sector_num - src_cur_offset) << BDRV_SECTOR_BITS; +blk = s->src[src_cur]; +bs_sectors = s->src_sectors[src_cur]; + +n = MIN(nb_sectors, bs_sectors - (sector_num - src_cur_offset)); + +ret = blk_co_copy_range(blk, offset, s->target, +sector_num << BDRV_SECTOR_BITS, +n << BDRV_SECTOR_BITS, 0); +if (ret < 0) { +return ret; +} + +sector_num += n; +nb_sectors -= n; +} +return 0; +} + static void coroutine_fn convert_co_do_copy(void *opaque) { ImgConvertState *s = opaque; @@ -1762,6 +1794,7 @@ static void coroutine_fn convert_co_do_copy(void *opaque) int n; int64_t sector_num; enum ImgConvertBlockStatus status; +bool copy_range; qemu_co_mutex_lock(&s->lock); if (s->ret != -EINPROGRESS || s->sector_num >= s->total_sectors) { @@ -1791,7 +1824,9 @@ static void coroutine_fn convert_co_do_copy(void *opaque) s->allocated_sectors, 0); } -if (status == BLK_DATA) { +retry: +copy_range = s->copy_range && s->status == BLK_DATA; +if (status == BLK_DATA && !copy_range) { ret = convert_co_read(s, sector_num, n, buf); if (ret < 0) { error_report("error while reading sector %" PRId64 @@ -1813,7 +1848,15 @@ static void coroutine_fn convert_co_do_copy(void *opaque) } if (s->ret == -EINPROGRESS) { -ret = convert_co_write(s, sector_num, n, buf, status); +if (copy_range) { +ret = convert_co_copy_range(s, sector_num, n); +if (ret) { +s->copy_range = false; +goto retry; +} +} else { +ret = convert_co_write(s, sector_num, n, buf, status); +} if (ret < 0) { error_report("error while writing sector %" PRId64 ": %s", sector_num, strerror(-ret)); @@ -1936,6 +1979,7 @@ static int img_convert(int argc, char **argv) ImgConvertState s = (ImgConvertState) { /* Need at least 4k of zeros for sparse detection */ .min_sparse = 8, +.copy_range = true, .buf_sectors= IO_BUF_SIZE / BDRV_SECTOR_SIZE, .wr_in_order= true, .num_coroutines = 8, @@ -1976,6 +2020,7 @@ static int img_convert(int argc, char **argv) break; case 'c': s.compressed = true; +s.copy_range = false; break; case 'o': if (!is_valid_option_list(optarg)) { @@ -2017,6 +2062,7 @@ static int img_convert(int argc, char **argv) } s.min_sparse = sval / BDRV_SECTOR_SIZE; +s.copy_range = false; break; } case 'p': -- 2.17.0
Re: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading
On Thu, May 31, 2018 at 11:45:17PM -0700, no-re...@patchew.org wrote: > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c: In function > ‘iscsi_populate_target_desc’: > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c:2242:15: error: > ‘IDENT_DESCR_TGT_DESCR’ undeclared (first use in this function); did you mean > ‘IDENT_DESCR_TGT_DESCR_SIZE’? > desc[0] = IDENT_DESCR_TGT_DESCR; >^ >IDENT_DESCR_TGT_DESCR_SIZE > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c:2242:15: note: each > undeclared identifier is reported only once for each function it appears in Fam, is this failure expected? Aside from this I'm happy with the series. signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v2 05/20] 9p: Properly set errp in fstatfs error path
On Thu, 31 May 2018 21:26:00 -0400 Keno Fischer wrote: > In the review of > > 9p: Avoid warning if FS_IOC_GETVERSION is not defined > > Grep Kurz noted this error path was failing to set errp. > Fix that. > > Signed-off-by: Keno Fischer > --- This is a bug fix so I've applied it to 9p-next. Thanks! > > Changes since v1: New patch > > hw/9pfs/9p-local.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c > index adc169a..576c8e3 100644 > --- a/hw/9pfs/9p-local.c > +++ b/hw/9pfs/9p-local.c > @@ -1420,6 +1420,8 @@ static int local_init(FsContext *ctx, Error **errp) > */ > if (fstatfs(data->mountfd, &stbuf) < 0) { > close_preserve_errno(data->mountfd); > +error_setg_errno(errp, errno, > +"failed to stat file system at '%s'", ctx->fs_root); > goto err; > } > switch (stbuf.f_type) {
Re: [Qemu-devel] [PATCH v8 01/11] docker: Update fedora image to 28
On Fri, Jun 01, 2018 at 02:28:39PM +0800, Fam Zheng wrote: > Signed-off-by: Fam Zheng > --- > tests/docker/dockerfiles/fedora.docker | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Seems reasonable, Fedora is a cutting-edge distro. Unlike stable distros like CentOS and Debian where we actually want the oldest supported release, we want the latest release for Fedora. Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT
That’s right. Pavel Dovgalyuk From: Arnabjyoti Kalita [mailto:akal...@cs.stonybrook.edu] Sent: Friday, June 01, 2018 11:27 AM To: Pavel Dovgalyuk Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Pavel Dovgalyuk Subject: Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT Dear Pavel, Thank you for providing me with all the details. Let us take an example of a Network packet. In icount mode, when the network backend, receives a network packet, you record the whole packet with the help of the replay-filter. This packet will be written to the log file. Now when the time comes for replay, you stop accepting any packets from the network backend and directly inject all of the packets that you have already recorded in the log file into the guest address space memory. Am I correct in understanding this ? Thanks and Regards, Arnab On Fri, Jun 1, 2018 at 1:31 AM, Pavel Dovgalyuk wrote: Hi, I’m not familiar with KVM, but I know successful attempts of replaying the execution by logging IO and MMIO in TCG mode. The difference in CPU I/O and VM I/O is the following. In icount we record anything coming into the VM, but not into the CPU. It means that the whole packet is recorded. Virtual hardware behaves deterministically and therefore CPU will get identical input in case of replay, because the whole recorded packet is injected again by the filter. Pavel Dovgalyuk From: Arnabjyoti Kalita [mailto:akal...@cs.stonybrook.edu] Sent: Thursday, May 31, 2018 11:14 PM To: Pavel Dovgalyuk Cc: Stefan Hajnoczi; qemu-devel@nongnu.org; Pavel Dovgalyuk Subject: Re: [Qemu-devel] Recording I/O activity after KVM does a VMEXIT Dear Pavel, Thank you for your answer. I am not being able to understand the difference between CPU I/Os and VM I/Os. Would any network packet that comes into the Guest OS from the outside be a part of VM I/O or CPU I/O ? I am only interested in "recording" and "replaying" those network packets that come from the outside into the networking backend and not the other way around. Say for example when I get a VMExit because of the arrival of a network packet, I will use the VMExit reason : "KVM_EXIT_MMIO" to trace back to "e1000_mmio_write()" which I expect should be enough to record network packets that come from the outside and write to the guest address space for "e1000" devices. In such a case, I think I will not have to use the "network-filter" backend that you use to record VM I/O only. Let me know if you find errors in my approach. I will try to see how I can record disk packets. If disk packets use other ways of writing to the guest memory apart from a normal VMExit, I will try to find it out. Eventually I hope that it will use one of the available disk front-end functions to write to the guest memory from the disk, just like e1000 does with an "e1000_mmio_write()" call. Thanks and best regards, Arnab On Thu, May 31, 2018 at 8:44 AM, Pavel Dovgalyuk wrote: > From: Stefan Hajnoczi [mailto:stefa...@gmail.com] > On Wed, May 30, 2018 at 11:19:13PM -0400, Arnabjyoti Kalita wrote: > > I am trying to implement a 'minimal' record-replay mechanism for KVM, which > > is similar to the one existing for TCG via -icount. I am trying to record > > I/O events only (specifically disk and network events) when KVM does a > > VMEXIT. This has led me to the function kvm_cpu_exec where I can clearly > > see the different ways of handling all of the possible VMExit cases (like > > PIO, MMIO etc.). To record network packets, I am working with the e1000 > > hardware device. > > > > Can I make sure that all of the network I/O, atleast for the e1000 device > > happens through the KVM_EXIT_MMIO case and subsequent use of the > > address_space_rw() function ? Do I also need to look at other functions as > > well ? Also for recording disk activity, can I make sure that looking out > > for the KVM_EXIT_MMIO and/or KVM_EXIT_PIO cases in the vmexit mechanism, > > will be enough ? > > > > Let me know if there are other details that I need to take care of. I am > > using QEMU 2.11 on a x86-64 CPU and the guest runs a Linux Kernel 4.4 with > > Ubuntu 16.04. The main icount-based record/replay advantage is that we don't record any CPU IO. We record only VM IO (e.g., by using the network filter). Disk devices may transfer data to CPU using DMA, therefore intercepting only VMExit cases will not be enough. Pavel Dovgalyuk
Re: [Qemu-devel] [PATCH v8 00/11] qemu-img convert with copy offloading
On Fri, 06/01 10:37, Stefan Hajnoczi wrote: > On Thu, May 31, 2018 at 11:45:17PM -0700, no-re...@patchew.org wrote: > > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c: In function > > ‘iscsi_populate_target_desc’: > > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c:2242:15: error: > > ‘IDENT_DESCR_TGT_DESCR’ undeclared (first use in this function); did you > > mean ‘IDENT_DESCR_TGT_DESCR_SIZE’? > > desc[0] = IDENT_DESCR_TGT_DESCR; > >^ > >IDENT_DESCR_TGT_DESCR_SIZE > > /var/tmp/patchew-tester-tmp-2l7s8dte/src/block/iscsi.c:2242:15: note: each > > undeclared identifier is reported only once for each function it appears in > > Fam, is this failure expected? Nope. See my other reply (v9 posted). Fam
Re: [Qemu-devel] [PATCH v2 06/20] 9p: Avoid warning if FS_IOC_GETVERSION is not defined
On Thu, 31 May 2018 21:26:01 -0400 Keno Fischer wrote: > Both `stbuf` and `local_ioc_getversion` where unused when > FS_IOC_GETVERSION was not defined, causing a compiler warning. > > Reorgnaize the code to avoid this warning. > > Signed-off-by: Keno Fischer > --- > > Changes since v1: > * As request in review, logic is factored into a >local_ioc_getversion_init function. > > hw/9pfs/9p-local.c | 43 +-- > 1 file changed, 25 insertions(+), 18 deletions(-) > > diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c > index 576c8e3..6222891 100644 > --- a/hw/9pfs/9p-local.c > +++ b/hw/9pfs/9p-local.c > @@ -1375,10 +1375,10 @@ static int local_unlinkat(FsContext *ctx, V9fsPath > *dir, > return ret; > } > > +#ifdef FS_IOC_GETVERSION > static int local_ioc_getversion(FsContext *ctx, V9fsPath *path, > mode_t st_mode, uint64_t *st_gen) > { > -#ifdef FS_IOC_GETVERSION > int err; > V9fsFidOpenState fid_open; > > @@ -1397,32 +1397,19 @@ static int local_ioc_getversion(FsContext *ctx, > V9fsPath *path, > err = ioctl(fid_open.fd, FS_IOC_GETVERSION, st_gen); > local_close(ctx, &fid_open); > return err; > -#else > -errno = ENOTTY; > -return -1; > -#endif > } > +#endif > > -static int local_init(FsContext *ctx, Error **errp) > +static int local_ioc_getversion_init(FsContext *ctx, LocalData *data) > { > +#ifdef FS_IOC_GETVERSION > struct statfs stbuf; > -LocalData *data = g_malloc(sizeof(*data)); > > -data->mountfd = open(ctx->fs_root, O_DIRECTORY | O_RDONLY); > -if (data->mountfd == -1) { > -error_setg_errno(errp, errno, "failed to open '%s'", ctx->fs_root); > -goto err; > -} > - > -#ifdef FS_IOC_GETVERSION > /* > * use ioc_getversion only if the ioctl is definied > */ > if (fstatfs(data->mountfd, &stbuf) < 0) { > -close_preserve_errno(data->mountfd); > -error_setg_errno(errp, errno, > -"failed to stat file system at '%s'", ctx->fs_root); > -goto err; Hmm, I'd prefer to keep the error_setg_errno() with fstatfs(), ie, add an errp argument to this function. > +return -1; > } > switch (stbuf.f_type) { > case EXT2_SUPER_MAGIC: > @@ -1433,6 +1420,26 @@ static int local_init(FsContext *ctx, Error **errp) > break; > } > #endif > +return 0; > +} > + > +static int local_init(FsContext *ctx, Error **errp) > +{ > +LocalData *data = g_malloc(sizeof(*data)); > + > +data->mountfd = open(ctx->fs_root, O_DIRECTORY | O_RDONLY); > +if (data->mountfd == -1) { > +error_setg_errno(errp, errno, "failed to open '%s'", ctx->fs_root); > +goto err; > +} > + > +if (local_ioc_getversion_init(ctx, data) < 0) { > +close_preserve_errno(data->mountfd); And this could even be a plain close() > +error_setg_errno(errp, errno, > +"failed initialize ioc_getversion for file system at '%s'", True, but I think "failed to stat file system" is more meaningful, especially with the errno. > +ctx->fs_root); > +goto err; > +} > > if (ctx->export_flags & V9FS_SM_PASSTHROUGH) { > ctx->xops = passthrough_xattr_ops;
Re: [Qemu-devel] [PATCH 2/2] backup: Use copy offloading
On Thu, May 31, 2018 at 10:34:45AM +0800, Fam Zheng wrote: > The implementation is similar to the 'qemu-img convert'. In the > beginning of the job, offloaded copy is attempted. If it fails, further > I/O will go through the existing bounce buffer code path. > > Signed-off-by: Fam Zheng > --- > block/backup.c | 93 > +++--- > block/trace-events | 1 + > 2 files changed, 62 insertions(+), 32 deletions(-) > > diff --git a/block/backup.c b/block/backup.c > index 4e228e959b..ab189693f4 100644 > --- a/block/backup.c > +++ b/block/backup.c > @@ -45,6 +45,8 @@ typedef struct BackupBlockJob { > QLIST_HEAD(, CowRequest) inflight_reqs; > > HBitmap *copy_bitmap; > +bool use_copy_range; > +int64_t copy_range_size; > } BackupBlockJob; > > static const BlockJobDriver backup_job_driver; > @@ -111,49 +113,70 @@ static int coroutine_fn backup_do_cow(BackupBlockJob > *job, > cow_request_begin(&cow_request, job, start, end); > > for (; start < end; start += job->cluster_size) { > +retry: This for loop is becoming complex. Please introduce helper functions. The loop body can be replaced with something like this: if (!hbitmap_get(job->copy_bitmap, start / job->cluster_size)) { trace_backup_do_cow_skip(job, start); continue; /* already copied */ } trace_backup_do_cow_process(job, start); ret = -ENOTSUPP; if (job->use_copy_range) { ret = cow_with_offload(...); } if (ret < 0) { job->use_copy_range = false; ret = cow_with_bounce_buffer(...); } if (ret < 0) { trace_backup_do_cow_write_fail(job, start, ret); goto out; } signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote: > On 06/01/2018 01:07 PM, Peter Xu wrote: > > On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote: > > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: > > > > This is the deivce part implementation to add a new feature, > > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device > > > > receives the guest free page hints from the driver and clears the > > > > corresponding bits in the dirty bitmap, so that those free pages are > > > > not transferred by the migration thread to the destination. > > > > > > > > - Test Environment > > > > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > > > Guest: 8G RAM, 4 vCPU > > > > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 > > > > second > > > > > > > > - Test Results > > > > - Idle Guest Live Migration Time (results are averaged over 10 > > > > runs): > > > > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction > > > > - Guest with Linux Compilation Workload (make bzImage -j4): > > > > - Live Migration Time (average) > > > >Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% > > > > reduction > > > > - Linux Compilation Time > > > >Optimization v.s. Legacy = 4min56s v.s. 5min3s > > > >--> no obvious difference > > > > > > > > - Source Code > > > > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git > > > > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git > > > Hi, Wei, > > > > > > I have a very high-level question to the series. > > > > > > IIUC the core idea for this series is that we can avoid sending some > > > of the pages if we know that we don't need to send them. I think this > > > is based on the fact that on the destination side all the pages are by > > > default zero after they are malloced. While before this series, IIUC > > > any migration will send every single page to destination, no matter > > > whether it's zeroed or not. So I'm uncertain about whether this will > > > affect the received bitmap on the destination side. Say, before this > > > series, the received bitmap will directly cover the whole RAM bitmap > > > after migration is finished, now it's won't. Will there be any side > > > effect? I don't see obvious issue now, but just raise this question > > > up. > > > > > > Meanwhile, this reminds me about a more funny idea: whether we can > > > just avoid sending the zero pages directly from QEMU's perspective. > > > In other words, can we just do nothing if save_zero_page() detected > > > that the page is zero (I guess the is_zero_range() can be fast too, > > > but I don't know exactly how fast it is)? And how that would be > > > differed from this page hinting way in either performance and other > > > aspects. > > I noticed a problem (after I wrote the above paragraph 5 minutes > > ago...): when a page was valid and sent to the destination (with > > non-zero data), however after a while that page was zeroed. Then if > > we don't send zero pages at all, we won't send the page after it's > > zeroed. Then on the destination side we'll have a stale non-zero > > page. Is my understanding correct? Will that be a problem to this > > series too where a valid page can be possibly freed and hinted? > > I think that won't be an issue either for zero page optimization or this > free page optimization. > > For the zero page optimization, QEMU always sends compressed 0s to the > destination. The zero page is detected at the time QEMU checks it (before > sending the page). if it is a 0 page, QEMU compresses all 0s (actually just > a flag) and send it. what I meant is, can we just do not even send that ZERO flag at all? :) > > For the free page optimization, we skip free pages (could be thought of as 0 > pages in this context). The zero pages are detected at the time guest > reports it QEMU. The page won't be reported if it is non-zero (i.e. used). Sorry I must have not explained myself well. Let's assume the page hint is used. I meant this: - start precopy, page P is non-zero (let's say, page has content P1, which is non-zero) - we send page P with content P1 on src, then latest destination cache of page P is P1 - page P is freed by the guest, then it becomes zero, dirty bitmap of P is set since it's changed (from P1 to zeroed page) - page P is provided as hint that we can skip it since it's zeroed, then the dirty bit of P is cleared - ... (page P is never used until migration completes) After migration completes, page P should be an zeroed page on the source, while IIUC on the destination side it's still with stale data P1. Did I miss anything important? Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH v2 07/20] 9p: Move a couple xattr functions to 9p-util
On Thu, 31 May 2018 21:26:02 -0400 Keno Fischer wrote: > These functions will need custom implementations on Darwin. Since the > implementation is very similar among all of them, and 9p-util already > has the _nofollow version of fgetxattrat, let's move them all there. > > Signed-off-by: Keno Fischer > --- > This cleanup makes sense irrespective of the rest of the series. Applied to 9p-next. Thanks! > Changes since v1: > * fgetxattr_follow is dropped in favor of a different approach >later in the series. > > hw/9pfs/9p-util.c | 33 + > hw/9pfs/9p-util.h | 4 > hw/9pfs/9p-xattr.c | 33 - > 3 files changed, 37 insertions(+), 33 deletions(-) > > diff --git a/hw/9pfs/9p-util.c b/hw/9pfs/9p-util.c > index f709c27..614b7fc 100644 > --- a/hw/9pfs/9p-util.c > +++ b/hw/9pfs/9p-util.c > @@ -24,3 +24,36 @@ ssize_t fgetxattrat_nofollow(int dirfd, const char > *filename, const char *name, > g_free(proc_path); > return ret; > } > + > +ssize_t flistxattrat_nofollow(int dirfd, const char *filename, > + char *list, size_t size) > +{ > +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > +int ret; > + > +ret = llistxattr(proc_path, list, size); > +g_free(proc_path); > +return ret; > +} > + > +ssize_t fremovexattrat_nofollow(int dirfd, const char *filename, > +const char *name) > +{ > +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > +int ret; > + > +ret = lremovexattr(proc_path, name); > +g_free(proc_path); > +return ret; > +} > + > +int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name, > + void *value, size_t size, int flags) > +{ > +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > +int ret; > + > +ret = lsetxattr(proc_path, name, value, size, flags); > +g_free(proc_path); > +return ret; > +} > diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h > index dc0d2e2..79ed6b2 100644 > --- a/hw/9pfs/9p-util.h > +++ b/hw/9pfs/9p-util.h > @@ -60,5 +60,9 @@ ssize_t fgetxattrat_nofollow(int dirfd, const char *path, > const char *name, > void *value, size_t size); > int fsetxattrat_nofollow(int dirfd, const char *path, const char *name, > void *value, size_t size, int flags); > +ssize_t flistxattrat_nofollow(int dirfd, const char *filename, > + char *list, size_t size); > +ssize_t fremovexattrat_nofollow(int dirfd, const char *filename, > +const char *name); > > #endif > diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c > index d05c1a1..c696d8f 100644 > --- a/hw/9pfs/9p-xattr.c > +++ b/hw/9pfs/9p-xattr.c > @@ -60,17 +60,6 @@ ssize_t pt_listxattr(FsContext *ctx, const char *path, > return name_size; > } > > -static ssize_t flistxattrat_nofollow(int dirfd, const char *filename, > - char *list, size_t size) > -{ > -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > -int ret; > - > -ret = llistxattr(proc_path, list, size); > -g_free(proc_path); > -return ret; > -} > - > /* > * Get the list and pass to each layer to find out whether > * to send the data or not > @@ -196,17 +185,6 @@ ssize_t pt_getxattr(FsContext *ctx, const char *path, > const char *name, > return local_getxattr_nofollow(ctx, path, name, value, size); > } > > -int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name, > - void *value, size_t size, int flags) > -{ > -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > -int ret; > - > -ret = lsetxattr(proc_path, name, value, size, flags); > -g_free(proc_path); > -return ret; > -} > - > ssize_t local_setxattr_nofollow(FsContext *ctx, const char *path, > const char *name, void *value, size_t size, > int flags) > @@ -235,17 +213,6 @@ int pt_setxattr(FsContext *ctx, const char *path, const > char *name, void *value, > return local_setxattr_nofollow(ctx, path, name, value, size, flags); > } > > -static ssize_t fremovexattrat_nofollow(int dirfd, const char *filename, > - const char *name) > -{ > -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > -int ret; > - > -ret = lremovexattr(proc_path, name); > -g_free(proc_path); > -return ret; > -} > - > ssize_t local_removexattr_nofollow(FsContext *ctx, const char *path, > const char *name) > {
Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
On Fri, Jun 01, 2018 at 03:36:01PM +0800, Wei Wang wrote: > On 06/01/2018 12:00 PM, Peter Xu wrote: > > On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote: > > > This patch adds an API to clear bits corresponding to guest free pages > > > from the dirty bitmap. Spilt the free page block if it crosses the QEMU > > > RAMBlock boundary. > > > > > > Signed-off-by: Wei Wang > > > CC: Dr. David Alan Gilbert > > > CC: Juan Quintela > > > CC: Michael S. Tsirkin > > > --- > > > include/migration/misc.h | 2 ++ > > > migration/ram.c | 44 > > > > > > 2 files changed, 46 insertions(+) > > > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h > > > index 4ebf24c..113320e 100644 > > > --- a/include/migration/misc.h > > > +++ b/include/migration/misc.h > > > @@ -14,11 +14,13 @@ > > > #ifndef MIGRATION_MISC_H > > > #define MIGRATION_MISC_H > > > +#include "exec/cpu-common.h" > > > #include "qemu/notify.h" > > > /* migration/ram.c */ > > > void ram_mig_init(void); > > > +void qemu_guest_free_page_hint(void *addr, size_t len); > > > /* migration/block.c */ > > > diff --git a/migration/ram.c b/migration/ram.c > > > index 9a72b1a..0147548 100644 > > > --- a/migration/ram.c > > > +++ b/migration/ram.c > > > @@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp) > > > } > > > /* > > > + * This function clears bits of the free pages reported by the caller > > > from the > > > + * migration dirty bitmap. @addr is the host address corresponding to the > > > + * start of the continuous guest free pages, and @len is the total bytes > > > of > > > + * those pages. > > > + */ > > > +void qemu_guest_free_page_hint(void *addr, size_t len) > > > +{ > > > +RAMBlock *block; > > > +ram_addr_t offset; > > > +size_t used_len, start, npages; > > Do we need to check here on whether a migration is in progress? Since > > if not I'm not sure whether this hint still makes any sense any more, > > and more importantly it seems to me that block->bmap below at [1] is > > only valid during a migration. So I'm not sure whether QEMU will > > crash if this function is called without a running migration. > > OK. How about just adding comments above to have users noted that this > function should be used during migration? > > If we want to do a sanity check here, I think it would be easier to just > check !block->bmap here. I think the faster way might be that we check against the migration state. > > > > > > > + > > > +for (; len > 0; len -= used_len) { > > > +block = qemu_ram_block_from_host(addr, false, &offset); > > > +if (unlikely(!block)) { > > > +return; > > We should never reach here, should we? Assuming the callers of this > > function should always pass in a correct host address. If we are very > > sure that the host addr should be valid, could we just assert? > > Probably not the case, because of the corner case that the memory would be > hot unplugged after the free page is reported to QEMU. Question: Do we allow to do hot plug/unplug for memory during migration? > > > > > > > > +} > > > + > > > +/* > > > + * This handles the case that the RAMBlock is resized after the > > > free > > > + * page hint is reported. > > > + */ > > > +if (unlikely(offset > block->used_length)) { > > > +return; > > > +} > > > + > > > +if (len <= block->used_length - offset) { > > > +used_len = len; > > > +} else { > > > +used_len = block->used_length - offset; > > > +addr += used_len; > > > +} > > > + > > > +start = offset >> TARGET_PAGE_BITS; > > > +npages = used_len >> TARGET_PAGE_BITS; > > > + > > > +qemu_mutex_lock(&ram_state->bitmap_mutex); > > So now I think I understand the lock can still be meaningful since > > this function now can be called outside the migration thread (e.g., in > > vcpu thread). But still it would be nice to mention it somewhere on (Actually after read the next patch I think it's in iothread, so I'd better reply with all the series read over next time :) > > the truth of the lock. > > > > Yes. Thanks for the reminder. I will add some explanation to the patch 2 > commit log. Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH v2 08/20] 9p: Rename 9p-util -> 9p-util-linux
On Thu, 31 May 2018 21:26:03 -0400 Keno Fischer wrote: > The current file only has the Linux versions of these functions. > Rename the file accordingly and update the Makefile to only build > it on Linux. A Darwin version of these will follow later in the > series. > > Signed-off-by: Keno Fischer > --- > Reviewed-by: Greg Kurz > Changes since v1: New patch > > hw/9pfs/9p-util-linux.c | 59 > + > hw/9pfs/9p-util.c | 59 > - > hw/9pfs/Makefile.objs | 3 ++- > 3 files changed, 61 insertions(+), 60 deletions(-) > create mode 100644 hw/9pfs/9p-util-linux.c > delete mode 100644 hw/9pfs/9p-util.c > > diff --git a/hw/9pfs/9p-util-linux.c b/hw/9pfs/9p-util-linux.c > new file mode 100644 > index 000..defa3a4 > --- /dev/null > +++ b/hw/9pfs/9p-util-linux.c > @@ -0,0 +1,59 @@ > +/* > + * 9p utilities (Linux Implementation) > + * > + * Copyright IBM, Corp. 2017 > + * > + * Authors: > + * Greg Kurz > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + */ > + > +#include "qemu/osdep.h" > +#include "qemu/xattr.h" > +#include "9p-util.h" > + > +ssize_t fgetxattrat_nofollow(int dirfd, const char *filename, const char > *name, > + void *value, size_t size) > +{ > +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > +int ret; > + > +ret = lgetxattr(proc_path, name, value, size); > +g_free(proc_path); > +return ret; > +} > + > +ssize_t flistxattrat_nofollow(int dirfd, const char *filename, > + char *list, size_t size) > +{ > +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > +int ret; > + > +ret = llistxattr(proc_path, list, size); > +g_free(proc_path); > +return ret; > +} > + > +ssize_t fremovexattrat_nofollow(int dirfd, const char *filename, > +const char *name) > +{ > +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > +int ret; > + > +ret = lremovexattr(proc_path, name); > +g_free(proc_path); > +return ret; > +} > + > +int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name, > + void *value, size_t size, int flags) > +{ > +char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > +int ret; > + > +ret = lsetxattr(proc_path, name, value, size, flags); > +g_free(proc_path); > +return ret; > +} > diff --git a/hw/9pfs/9p-util.c b/hw/9pfs/9p-util.c > deleted file mode 100644 > index 614b7fc..000 > --- a/hw/9pfs/9p-util.c > +++ /dev/null > @@ -1,59 +0,0 @@ > -/* > - * 9p utilities > - * > - * Copyright IBM, Corp. 2017 > - * > - * Authors: > - * Greg Kurz > - * > - * This work is licensed under the terms of the GNU GPL, version 2 or later. > - * See the COPYING file in the top-level directory. > - */ > - > -#include "qemu/osdep.h" > -#include "qemu/xattr.h" > -#include "9p-util.h" > - > -ssize_t fgetxattrat_nofollow(int dirfd, const char *filename, const char > *name, > - void *value, size_t size) > -{ > -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > -int ret; > - > -ret = lgetxattr(proc_path, name, value, size); > -g_free(proc_path); > -return ret; > -} > - > -ssize_t flistxattrat_nofollow(int dirfd, const char *filename, > - char *list, size_t size) > -{ > -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > -int ret; > - > -ret = llistxattr(proc_path, list, size); > -g_free(proc_path); > -return ret; > -} > - > -ssize_t fremovexattrat_nofollow(int dirfd, const char *filename, > -const char *name) > -{ > -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > -int ret; > - > -ret = lremovexattr(proc_path, name); > -g_free(proc_path); > -return ret; > -} > - > -int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name, > - void *value, size_t size, int flags) > -{ > -char *proc_path = g_strdup_printf("/proc/self/fd/%d/%s", dirfd, > filename); > -int ret; > - > -ret = lsetxattr(proc_path, name, value, size, flags); > -g_free(proc_path); > -return ret; > -} > diff --git a/hw/9pfs/Makefile.objs b/hw/9pfs/Makefile.objs > index fd90b62..083508f 100644 > --- a/hw/9pfs/Makefile.objs > +++ b/hw/9pfs/Makefile.objs > @@ -1,4 +1,5 @@ > -common-obj-y = 9p.o 9p-util.o > +common-obj-y = 9p.o > +common-obj-$(CONFIG_LINUX) += 9p-util-linux.o > common-obj-y += 9p-local.o 9p-xattr.o > common-obj-y += 9p-xattr-user.o 9p-posix-acl.o > common-obj-y += coth.o cofs.o codir.o cofile.o
Re: [Qemu-devel] [PATCH v2 09/20] 9p: Properly check/translate flags in unlinkat
On Thu, 31 May 2018 21:26:04 -0400 Keno Fischer wrote: > This code previously relied on P9_DOTL_AT_REMOVEDIR and AT_REMOVEDIR > having the same numerical value and deferred any errorchecking to the > syscall itself. However, while the former assumption is true on Linux, > it is not true in general. Thus, add appropriate error checking and > translation to the 9p unlinkat server code. > > Signed-off-by: Keno Fischer > --- > Looks good but handle_unlinkat() needs to be adapted to this change. Other backends (proxy and synth) seem to ignore the flags. > Changes since v1: > * Code was moved from 9p-local.c to server entry point in 9p.c > > hw/9pfs/9p.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > index b80db65..a757374 100644 > --- a/hw/9pfs/9p.c > +++ b/hw/9pfs/9p.c > @@ -2522,7 +2522,7 @@ static void coroutine_fn v9fs_unlinkat(void *opaque) > { > int err = 0; > V9fsString name; > -int32_t dfid, flags; > +int32_t dfid, flags, rflags = 0; > size_t offset = 7; > V9fsPath path; > V9fsFidState *dfidp; > @@ -2549,6 +2549,15 @@ static void coroutine_fn v9fs_unlinkat(void *opaque) > goto out_nofid; > } > > +if (flags & ~P9_DOTL_AT_REMOVEDIR) { > +err = -EINVAL; > +goto out_nofid; > +} > + > +if (flags & P9_DOTL_AT_REMOVEDIR) { > +rflags |= AT_REMOVEDIR; > +} > + > dfidp = get_fid(pdu, dfid); > if (dfidp == NULL) { > err = -EINVAL; > @@ -2567,7 +2576,7 @@ static void coroutine_fn v9fs_unlinkat(void *opaque) > if (err < 0) { > goto out_err; > } > -err = v9fs_co_unlinkat(pdu, &dfidp->path, &name, flags); > +err = v9fs_co_unlinkat(pdu, &dfidp->path, &name, rflags); > if (!err) { > err = offset; > }
[Qemu-devel] virtio-vsock feature has no TCG (non-KVM) support
Please, add important note to https://wiki.qemu.org/Features/VirtioVsock page, that this feature only supported in KVM accelerated mode. It's not obvious. Furthermore, it isn't checked by qemu when invoking with "-device vhost-vsock-pci,..." and user encounters this only when communicating (via AF_VSOCK) application fails to connect() with weird "Connection timed out" error. -- С уважением, Артем Писаренко
Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start
On Fri, 1 Jun 2018 08:17:12 + xuyandong wrote: > Hi there, > > I am doing some test on qemu vcpu hotplug and I run into some trouble. > An emulation failure occurs and qemu prints the following msg: > > KVM internal error. Suberror: 1 > emulation failure > EAX= EBX= ECX= EDX=0600 > ESI= EDI= EBP= ESP=fff8 > EIP=ff53 EFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES = 9300 > CS =f000 000f 9b00 > SS = 9300 > DS = 9300 > FS = 9300 > GS = 9300 > LDT= 8200 > TR = 8b00if > GDT= > IDT= > CR0=6010 CR2= CR3= CR4= > DR0= DR1= DR2= > DR3= > DR6=0ff0 DR7=0400 > EFER= > Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3 66 68 > 21 8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 > c6 > > I notice that guest is still running SeabBIOS in real mode when the vcpu has > just been pluged. > This emulation failure can be steadly reproduced if I am doing vcpu hotplug > during VM launch process. > After some digging, I find this KVM internal error shows up because KVM > cannot emulate some MMIO (gpa 0xfff53 ). > > So I am confused, > (1) does qemu support vcpu hotplug even if guest is running seabios ? There is no code that forbids it, and I would expect it not to trigger error and be NOP. > (2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm confirm > it as a mmio address incorrectly? KVM trace and bios debug log might give more information to guess where to look or even better would be to debug Seabios and find out what exactly goes wrong if you could do it.
Re: [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers
On Thu, 17 May 2018 10:15:19 +0200 David Hildenbrand wrote: maybe subj: make hotplug handlers use local_error > For multi stage hotplug handlers, we'll have to do some error handling > in some hotplug functions, so let's use a local error variable (except > for unplug requests). > > Also, add code to pass control to the final stage hotplug handler at the > parent bus. doing several not related things in one patch doesn't help reviewing it. Also as explained 04/14 it's not needed at all. Could you try to keep patches minimal, we can add more complexity in later revisions if it really necessary. > Signed-off-by: David Hildenbrand > --- > hw/ppc/spapr.c | 54 +++--- > 1 file changed, 43 insertions(+), 11 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index ebf30dd60b..b7c5c95f7a 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler > *hotplug_dev, > { > MachineState *ms = MACHINE(hotplug_dev); > sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms); > +Error *local_err = NULL; > > +/* final stage hotplug handler */ > if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { > int node; > > if (!smc->dr_lmb_enabled) { > -error_setg(errp, "Memory hotplug not supported for this > machine"); > -return; > +error_setg(&local_err, > + "Memory hotplug not supported for this machine"); > +goto out; > } > -node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, > errp); > -if (*errp) { > -return; > +node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, > +&local_err); > +if (local_err) { > +goto out; > } > if (node < 0 || node >= MAX_NODES) { > -error_setg(errp, "Invaild node %d", node); > -return; > +error_setg(&local_err, "Invaild node %d", node); > +goto out; > } > > -spapr_memory_plug(hotplug_dev, dev, node, errp); > +spapr_memory_plug(hotplug_dev, dev, node, &local_err); > } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) { > -spapr_core_plug(hotplug_dev, dev, errp); > +spapr_core_plug(hotplug_dev, dev, &local_err); > +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) { > +hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, > &local_err); > +} > +out: > +error_propagate(errp, local_err); > +} > + > +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev, > +DeviceState *dev, Error **errp) > +{ > +Error *local_err = NULL; > + > +/* final stage hotplug handler */ > +if (dev->parent_bus && dev->parent_bus->hotplug_handler) { > +hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev, > + &local_err); > } > +error_propagate(errp, local_err); > } > > static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev, > @@ -3618,17 +3639,27 @@ static void > spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev, > return; > } > spapr_core_unplug_request(hotplug_dev, dev, errp); > +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) { > +hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev, > + errp); > } > } > > static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev, >DeviceState *dev, Error **errp) > { > +Error *local_err = NULL; > + > +/* final stage hotplug handler */ > if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { > -spapr_memory_pre_plug(hotplug_dev, dev, errp); > +spapr_memory_pre_plug(hotplug_dev, dev, &local_err); > } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) { > -spapr_core_pre_plug(hotplug_dev, dev, errp); > +spapr_core_pre_plug(hotplug_dev, dev, &local_err); > +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) { > +hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev, > + &local_err); > } > +error_propagate(errp, local_err); > } > > static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine, > @@ -3988,6 +4019,7 @@ static void spapr_machine_class_init(ObjectClass *oc, > void *data) > mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id; > mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids; > hc->unplug_request = spapr_machine_device_unplug_request; > +hc->unplug = spapr_machine_device_unplug; > > smc->dr_lmb_enabled = true; > mc->d
Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
On Fri, Jun 01, 2018 at 03:21:54PM +0800, Wei Wang wrote: > On 06/01/2018 12:58 PM, Peter Xu wrote: > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: > > > This is the deivce part implementation to add a new feature, > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device > > > receives the guest free page hints from the driver and clears the > > > corresponding bits in the dirty bitmap, so that those free pages are > > > not transferred by the migration thread to the destination. > > > > > > - Test Environment > > > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > > Guest: 8G RAM, 4 vCPU > > > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 > > > second > > > > > > - Test Results > > > - Idle Guest Live Migration Time (results are averaged over 10 runs): > > > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction > > > - Guest with Linux Compilation Workload (make bzImage -j4): > > > - Live Migration Time (average) > > >Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% > > > reduction > > > - Linux Compilation Time > > >Optimization v.s. Legacy = 4min56s v.s. 5min3s > > >--> no obvious difference > > > > > > - Source Code > > > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git > > > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git > > Hi, Wei, > > > > I have a very high-level question to the series. > > Hi Peter, > > Thanks for joining the discussion :) Thanks for letting me know this thread. It's an interesting idea. :) > > > > > IIUC the core idea for this series is that we can avoid sending some > > of the pages if we know that we don't need to send them. I think this > > is based on the fact that on the destination side all the pages are by > > default zero after they are malloced. While before this series, IIUC > > any migration will send every single page to destination, no matter > > whether it's zeroed or not. So I'm uncertain about whether this will > > affect the received bitmap on the destination side. Say, before this > > series, the received bitmap will directly cover the whole RAM bitmap > > after migration is finished, now it's won't. Will there be any side > > effect? I don't see obvious issue now, but just raise this question > > up. > > This feature currently only supports pre-copy (I think the received bitmap > is something matters to post copy only). > That's why we have > rs->free_page_support = ..&& !migrate_postcopy(); Okay. > > > Meanwhile, this reminds me about a more funny idea: whether we can > > just avoid sending the zero pages directly from QEMU's perspective. > > In other words, can we just do nothing if save_zero_page() detected > > that the page is zero (I guess the is_zero_range() can be fast too, > > but I don't know exactly how fast it is)? And how that would be > > differed from this page hinting way in either performance and other > > aspects. > > I guess you referred to the zero page optimization. I think the major > overhead comes to the zero page checking - lots of memory accesses, which > also waste memory bandwidth. Please see the results attached in the cover > letter. The legacy case already includes the zero page optimization. I replied in the other thread. We can discuss there altogether. Actually after a second thought I think maybe what I worried there is exactly the reason why we must send the zero page flag - otherwise there can be stale non-zero page on destination. Here "zero page" and "freed page" is totally different idea since even if a page is zeroed it might still be in use (not freed)! While instead for a "free page" even if it's non-zero we might be able to not send it at all, though I am not sure whether that mismatch of data might cause any side effect too. I think the corresponding question would be: if a page is freed in Linux kernel, would its data matter any more? Thanks, -- Peter Xu
Re: [Qemu-devel] [RFC 2/3] hw/char/nrf51_uart: Implement nRF51 SoC UART
On Thu, May 31, 2018 at 2:58 PM, sundeep subbaraya wrote: > On Wed, May 30, 2018 at 3:33 AM, Julia Suvorova via Qemu-devel > wrote: >> +static uint64_t uart_read(void *opaque, hwaddr addr, unsigned int size) >> +{ >> +Nrf51UART *s = NRF51_UART(opaque); >> +uint64_t r; >> + >> +switch (addr) { >> +case A_RXD: >> +r = s->rx_fifo[s->rx_fifo_pos]; >> +if (s->rx_fifo_len > 0) { >> +s->rx_fifo_pos = (s->rx_fifo_pos + 1) % UART_FIFO_LENGTH; >> +s->rx_fifo_len--; >> +qemu_chr_fe_accept_input(&s->chr); >> +} >> +break; >> + >> +case A_INTENSET: >> +case A_INTENCLR: >> +case A_INTEN: >> +r = s->reg[A_INTEN]; >> +break; >> +default: >> +r = s->reg[addr]; > > You can use R_* macros for registers and access regs[ ] with addr/4 as index. > It is better than using big regs[ ] array out of which most of > locations go unused. Good point. The bug is more severe than an inefficiency. s->reg[addr] allows out-of-bounds accesses. This is a security bug. The memory region is 0x1000 *bytes* long, but the array has 0x1000 32-bit *elements*. A read from address 0xfffc results in a memory load from s->reg + 0xfffc * sizeof(s->reg[0]). That's beyond the end of the array! s->reg[A_*] should be changed to s->reg[R_*]. s->reg[addr] needs to be s->reg[addr / sizeof(s->reg[0])]. It may be worth adding a warning to scripts/checkpatch.pl for array[A_*] so this bug is reported automatically in the future. Stefan
Re: [Qemu-devel] [RFC 2/3] hw/char/nrf51_uart: Implement nRF51 SoC UART
On Fri, Jun 1, 2018 at 11:41 AM, Stefan Hajnoczi wrote: > On Thu, May 31, 2018 at 2:58 PM, sundeep subbaraya > wrote: >> On Wed, May 30, 2018 at 3:33 AM, Julia Suvorova via Qemu-devel >> wrote: >>> +static uint64_t uart_read(void *opaque, hwaddr addr, unsigned int size) >>> +{ >>> +Nrf51UART *s = NRF51_UART(opaque); >>> +uint64_t r; >>> + >>> +switch (addr) { >>> +case A_RXD: >>> +r = s->rx_fifo[s->rx_fifo_pos]; >>> +if (s->rx_fifo_len > 0) { >>> +s->rx_fifo_pos = (s->rx_fifo_pos + 1) % UART_FIFO_LENGTH; >>> +s->rx_fifo_len--; >>> +qemu_chr_fe_accept_input(&s->chr); >>> +} >>> +break; >>> + >>> +case A_INTENSET: >>> +case A_INTENCLR: >>> +case A_INTEN: >>> +r = s->reg[A_INTEN]; >>> +break; >>> +default: >>> +r = s->reg[addr]; >> >> You can use R_* macros for registers and access regs[ ] with addr/4 as index. >> It is better than using big regs[ ] array out of which most of >> locations go unused. > > Good point. The bug is more severe than an inefficiency. > s->reg[addr] allows out-of-bounds accesses. This is a security bug. > > The memory region is 0x1000 *bytes* long, but the array has 0x1000 > 32-bit *elements*. A read from address 0xfffc results in a memory > load from s->reg + 0xfffc * sizeof(s->reg[0]). That's beyond the end > of the array! Sorry, I was wrong. The array is large enough after all. It's just an inefficiency, but still worth fixing. Similar issues could lead to out-of-bound accesses. Stefan
Re: [Qemu-devel] [RFC] Intermediate block mirroring
On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote: >> > Were the (more or less) exact requirements of QMP blockdev-reopen >> > discussed? How is it different from qemu-io's "reopen" command? >> > What are the options that you can and can not change? >> >> I can't quite remember, I'm afraid. I think it was supposed to be >> pretty much qemu-io's reopen (so just bdrv_reopen()). I suppose you >> cannot change the driver (obviously) or probably the node name, because >> either would result in the node being replaced by a completely new one. >> >> Other than that, it probably depends on what the block driver supports, >> but ideally you should be able to change everything. > > Honestly the design of bdrv_reopen() is quite broken because of the > way it tries to maintain old options if they aren't specified, and > guesses what you might mean when you add flags to the mix. The exact > semantics are quite complicated and I'd rather avoid them in a stable > API. > > A clean QMP command would probably apply the same defaults as > blockdev-add, so you just get to specify the full options again. I have a prototype of this working and almost ready to be published, but there's a tricky thing with this part: If we want blockdev-reopen to apply the defaults for all options except from the ones expliclity specified by the user, then it means that we need to check not just the options that are present, but also the ones that are omitted. For example: { "execute": "blockdev-add", "arguments": { "driver": "null-aio", "node-name": "root", "size": 1024 } This adds a null-aio block device with the "size" option set to 1024 (the default is 1 << 30). null_reopen_prepare() allows reopening that block device, but it does not allow changing any of its options. Attempting to change the value of "size" is detected by the loop that checks unhandled options at the end of bdrv_reopen_prepare() and returns "Cannot change the option 'size'". So far, so good. We have this generic check for all options that works with all drivers, so as long as we only specify options that we know that can be changed, everything is fine. However if we want blockdev-reopen to apply the default values for all omitted options, then omitting "size" would be equivalent to setting it to its default value (1 << 30). And if "size" cannot be changed then QEMU should complain unless we explicitly set "size" to 1024 again on reopen. This complicates things a bit, because we would go from "the options that can't be changed are the ones that are not handled by each driver's _prepare() function" to "options that are absent can also produce an error". Berto
Re: [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain
On Thu, 17 May 2018 10:15:21 +0200 David Hildenbrand wrote: > Let's handle it via hotplug_handler_unplug(). E.g. necessary to hotplug/ > unplug memory devices (which a pc-dimm is) later. > > Signed-off-by: David Hildenbrand > --- > hw/ppc/spapr.c | 23 +++ > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 2f315f963b..286c38c842 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3291,7 +3291,8 @@ static sPAPRDIMMState > *spapr_recover_pending_dimm_state(sPAPRMachineState *ms, > /* Callback to be called during DRC release. */ > void spapr_lmb_release(DeviceState *dev) > { > -sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev)); > +HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev); > +sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_ctrl); > sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, > PC_DIMM(dev)); > > /* This information will get lost if a migration occurs > @@ -3309,9 +3310,21 @@ void spapr_lmb_release(DeviceState *dev) > > /* > * Now that all the LMBs have been removed by the guest, call the > - * pc-dimm unplug handler to cleanup up the pc-dimm device. > + * unplug handler chain. This can never fail. > */ > -pc_dimm_memory_unplug(dev, MACHINE(spapr)); > +hotplug_ctrl = qdev_get_hotplug_handler(dev); > +hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort); > +} > + > +static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState > *dev, > +Error **errp) > +{ > +sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev); > +sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, > PC_DIMM(dev)); > +g_assert(ds); > +g_assert(!ds->nr_lmbs); Theses 2 lines seems to unrelated to patch topic, could you drop it? if these values should be checked, it would be better to audit 'ds' use across spapr.c and file separate patch separately from this series. > +pc_dimm_memory_unplug(dev, MACHINE(hotplug_dev)); > object_unparent(OBJECT(dev)); > spapr_pending_dimm_unplugs_remove(spapr, ds); > } > @@ -3608,7 +3621,9 @@ static void spapr_machine_device_unplug(HotplugHandler > *hotplug_dev, > Error *local_err = NULL; > > /* final stage hotplug handler */ > -if (dev->parent_bus && dev->parent_bus->hotplug_handler) { > +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { > +spapr_memory_unplug(hotplug_dev, dev, &local_err); > +} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) { > hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev, > &local_err); > } otherwise, ignoring dev->parent_bus parts, patch looks reasonable
Re: [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core unplug via hotplug handler chain
On Thu, 17 May 2018 10:15:22 +0200 David Hildenbrand wrote: > Let's handle it via hotplug_handler_unplug(). > > Signed-off-by: David Hildenbrand Acked-by: Igor Mammedov > --- > hw/ppc/spapr.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 286c38c842..13d153b5a6 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3412,7 +3412,16 @@ static void *spapr_populate_hotplug_cpu_dt(CPUState > *cs, int *fdt_offset, > /* Callback to be called during DRC release. */ > void spapr_core_release(DeviceState *dev) > { > -MachineState *ms = MACHINE(qdev_get_hotplug_handler(dev)); > +HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev); > + > +/* Call the unplug handler chain. This can never fail. */ > +hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort); > +} > + > +static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev, > + Error **errp) > +{ > +MachineState *ms = MACHINE(hotplug_dev); > sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms); > CPUCore *cc = CPU_CORE(dev); > CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL); > @@ -3623,6 +3632,8 @@ static void spapr_machine_device_unplug(HotplugHandler > *hotplug_dev, > /* final stage hotplug handler */ > if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { > spapr_memory_unplug(hotplug_dev, dev, &local_err); > +} else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) { > +spapr_core_unplug(hotplug_dev, dev, &local_err); > } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) { > hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev, > &local_err);