Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 18.02.2010, at 06:57, OHMURA Kei wrote: We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. By the way, the following data is a result of x86 measured in QEMU/KVM. This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio). That does indeed look promising! Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly. I measured runtime of the test code with sample data. My test environment and results are described below. x86 Test Environment: CPU: 4x Intel Xeon Quad Core 2.66GHz Mem size: 6GB ppc Test Environment: CPU: 2x Dual Core PPC970MP Mem size: 2GB The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS was live migrating. To measure the runtime I copied cpu_get_real_ticks() of QEMU to my test program. Experimental results: Test1: Guest OS read 3GB file, which is bigger than memory. orig.(msec) patch(msec)ratio x860.30.16.4 ppc7.92.7 3.0 Test2: Guest OS read/write 3GB file, which is bigger than memory. orig.(msec)patch(msec)ratio x8612.0 3.23.7 ppc251.1 123 2.0 I also measured the runtime of bswap itself on ppc, and I found it was only just 0.3% ~ 0.7 % of the runtime described above. Awesome! Thank you so much for giving actual data to make me feel comfortable with it :-). Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. By the way, the following data is a result of x86 measured in QEMU/KVM. This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio). Test1: Guest OS read 3GB file, which is bigger than memory. #called orig.(msec) patch(msec) ratio 108 1.1 0.1 7.6 102 1.0 0.1 6.8 132 1.6 0.2 7.1 Test2: Guest OS read/write 3GB file, which is bigger than memory. #called orig.(msec) patch(msec) ratio 239433 7.7 4.3 210029 7.1 4.1 283240 9.9 4.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 17.02.2010, at 10:42, OHMURA Kei wrote: We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. By the way, the following data is a result of x86 measured in QEMU/KVM. This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio). That does indeed look promising! Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/17/2010 11:42 AM, OHMURA Kei wrote: We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. I've applied the patch - I think the x86 results justify it, and I'll be very surprised if ppc doesn't show a similar gain. Skipping 7 memory accesses and 7 tests must be a win. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 17.02.2010, at 10:47, Avi Kivity wrote: On 02/17/2010 11:42 AM, OHMURA Kei wrote: We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. I've applied the patch - I think the x86 results justify it, and I'll be very surprised if ppc doesn't show a similar gain. Skipping 7 memory accesses and 7 tests must be a win. Sounds good to me. I don't assume bswap to be horribly slow either. Just want to be sure. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. OK. I will prepare a test code with sample data. Since I found a ppc machine around, I will run the code and post the results of x86 and ppc. By the way, the following data is a result of x86 measured in QEMU/KVM. This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio). That does indeed look promising! Thanks for doing this micro-benchmark. I just want to be 100% sure that it doesn't affect performance for big endian badly. I measured runtime of the test code with sample data. My test environment and results are described below. x86 Test Environment: CPU: 4x Intel Xeon Quad Core 2.66GHz Mem size: 6GB ppc Test Environment: CPU: 2x Dual Core PPC970MP Mem size: 2GB The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS was live migrating. To measure the runtime I copied cpu_get_real_ticks() of QEMU to my test program. Experimental results: Test1: Guest OS read 3GB file, which is bigger than memory. orig.(msec)patch(msec)ratio x860.30.16.4 ppc7.92.73.0 Test2: Guest OS read/write 3GB file, which is bigger than memory. orig.(msec)patch(msec)ratio x8612.0 3.23.7 ppc251.1 1232.0 I also measured the runtime of bswap itself on ppc, and I found it was only just 0.3% ~ 0.7 % of the runtime described above. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? start - qemu-kvm.c: static int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len, void *bitmap, void *opaque) { /* warm up each function */ kvm_get_dirty_pages_log_range(start, bitmap, start, len); kvm_get_dirty_pages_log_range_new(start, bitmap, start, len); /* measurement */ int64_t t1, t2; t1 = cpu_get_real_ticks(); kvm_get_dirty_pages_log_range(start, bitmap, start, len); t1 = cpu_get_real_ticks() - t1; t2 = cpu_get_real_ticks(); kvm_get_dirty_pages_log_range_new(start, bitmap, start, len); t2 = cpu_get_real_ticks() - t2; printf(## %zd, %zd\n, t1, t2); fflush(stdout); return kvm_get_dirty_pages_log_range_new(start, bitmap, start, len); } end - -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 16.02.2010, at 12:16, OHMURA Kei wrote: We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Thanks for pointing out. I will post the data for x86 later. However, I don't have a test environment to check the impact of bswap. Would you please measure the run time between the following section if possible? It'd make more sense to have a real stand alone test program, no? I can try to write one today, but I have some really nasty important bugs to fix first. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 15.02.2010, at 07:12, OHMURA Kei wrote: dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c. But We think that dirty-bitmap-traveling by long size is faster than by byte We think? I mean - yes, I think so too. But have you actually measured it? How much improvement are we talking here? Is it still faster when a bswap is involved? Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/12/2010 04:03 AM, OHMURA Kei wrote: On 02/11/2010 Anthony Liguori anth...@codemonkey.ws wrote: Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more sense. Maybe I'm missing something here. I couldn't find leul_to_cpu(), so have defined it in bswap.h. Correct? --- a/bswap.h +++ b/bswap.h @@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #ifdef HOST_WORDS_BIGENDIAN #define cpu_to_32wu cpu_to_be32wu +#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v) #else #define cpu_to_32wu cpu_to_le32wu +#define leul_to_cpu(v) (v) #endif On 02/10/2010 Ulrich Drepper drep...@redhat.com wrote: If you're optimizing this code you might want to do it all. The compiler might not see through the bswap call and create unnecessary data dependencies. Especially problematic if the bitmap is really sparse. Also, the outer test is != while the inner test is . Be consistent. I suggest to replace the inner loop with do { ... } while (c != 0); Depending on how sparse the bitmap is populated this might reduce the number of data dependencies quite a bit. Combining all comments, the code would be like this. if (bitmap_ul[i] != 0) { c = leul_to_cpu(bitmap_ul[i]); do { j = ffsl(c) - 1; c = ~(1ul j); page_number = i * HOST_LONG_BITS + j; addr1 = page_number * TARGET_PAGE_SIZE; addr = offset + addr1; ram_addr = cpu_get_physical_page_desc(addr); cpu_physical_memory_set_dirty(ram_addr); } while (c != 0); } Except you don't need bitmap_ul any more - you can change the type of the bitmap variable, since all accesses should now be ulongs. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c. But We think that dirty-bitmap-traveling by long size is faster than by byte size especially when most of memory is not dirty. Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- bswap.h|2 ++ qemu-kvm.c | 31 --- 2 files changed, 18 insertions(+), 15 deletions(-) diff --git a/bswap.h b/bswap.h index 4558704..1f87e6d 100644 --- a/bswap.h +++ b/bswap.h @@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #ifdef HOST_WORDS_BIGENDIAN #define cpu_to_32wu cpu_to_be32wu +#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v) #else #define cpu_to_32wu cpu_to_le32wu +#define leul_to_cpu(v) (v) #endif #undef le_bswap diff --git a/qemu-kvm.c b/qemu-kvm.c index a305907..6952aa5 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2434,31 +2434,32 @@ int kvm_physical_memory_set_dirty_tracking(int enable) /* get kvm's dirty pages bitmap and update qemu's */ static int kvm_get_dirty_pages_log_range(unsigned long start_addr, - unsigned char *bitmap, + unsigned long *bitmap, unsigned long offset, unsigned long mem_size) { -unsigned int i, j, n = 0; -unsigned char c; -unsigned long page_number, addr, addr1; +unsigned int i, j; +unsigned long page_number, addr, addr1, c; ram_addr_t ram_addr; -unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8; +unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / +HOST_LONG_BITS; /* * bitmap-traveling is faster than memory-traveling (for addr...) * especially when most of the memory is not dirty. */ for (i = 0; i len; i++) { -c = bitmap[i]; -while (c 0) { -j = ffsl(c) - 1; -c = ~(1u j); -page_number = i * 8 + j; -addr1 = page_number * TARGET_PAGE_SIZE; -addr = offset + addr1; -ram_addr = cpu_get_physical_page_desc(addr); -cpu_physical_memory_set_dirty(ram_addr); -n++; +if (bitmap[i] != 0) { +c = leul_to_cpu(bitmap[i]); +do { +j = ffsl(c) - 1; +c = ~(1ul j); +page_number = i * HOST_LONG_BITS + j; +addr1 = page_number * TARGET_PAGE_SIZE; +addr = offset + addr1; +ram_addr = cpu_get_physical_page_desc(addr); +cpu_physical_memory_set_dirty(ram_addr); +} while (c != 0); } } return 0; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/11/2010 Anthony Liguori anth...@codemonkey.ws wrote: Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more sense. Maybe I'm missing something here. I couldn't find leul_to_cpu(), so have defined it in bswap.h. Correct? --- a/bswap.h +++ b/bswap.h @@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #ifdef HOST_WORDS_BIGENDIAN #define cpu_to_32wu cpu_to_be32wu +#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v) #else #define cpu_to_32wu cpu_to_le32wu +#define leul_to_cpu(v) (v) #endif On 02/10/2010 Ulrich Drepper drep...@redhat.com wrote: If you're optimizing this code you might want to do it all. The compiler might not see through the bswap call and create unnecessary data dependencies. Especially problematic if the bitmap is really sparse. Also, the outer test is != while the inner test is . Be consistent. I suggest to replace the inner loop with do { ... } while (c != 0); Depending on how sparse the bitmap is populated this might reduce the number of data dependencies quite a bit. Combining all comments, the code would be like this. if (bitmap_ul[i] != 0) { c = leul_to_cpu(bitmap_ul[i]); do { j = ffsl(c) - 1; c = ~(1ul j); page_number = i * HOST_LONG_BITS + j; addr1 = page_number * TARGET_PAGE_SIZE; addr = offset + addr1; ram_addr = cpu_get_physical_page_desc(addr); cpu_physical_memory_set_dirty(ram_addr); } while (c != 0); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c. But We think that dirty-bitmap-traveling by long size is faster than by byte size especially when most of memory is not dirty. Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- bswap.h|1 - qemu-kvm.c | 30 -- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/bswap.h b/bswap.h index 4558704..d896f01 100644 --- a/bswap.h +++ b/bswap.h @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #define cpu_to_32wu cpu_to_le32wu #endif -#undef le_bswap #undef be_bswap #undef le_bswaps #undef be_bswaps diff --git a/qemu-kvm.c b/qemu-kvm.c index a305907..ea07912 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2438,27 +2438,29 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr, unsigned long offset, unsigned long mem_size) { -unsigned int i, j, n = 0; -unsigned char c; -unsigned long page_number, addr, addr1; +unsigned int i, j; +unsigned long page_number, addr, addr1, c; ram_addr_t ram_addr; -unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8; +unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / +HOST_LONG_BITS; +unsigned long *bitmap_ul = (unsigned long *)bitmap; /* * bitmap-traveling is faster than memory-traveling (for addr...) * especially when most of the memory is not dirty. */ for (i = 0; i len; i++) { -c = bitmap[i]; -while (c 0) { -j = ffsl(c) - 1; -c = ~(1u j); -page_number = i * 8 + j; -addr1 = page_number * TARGET_PAGE_SIZE; -addr = offset + addr1; -ram_addr = cpu_get_physical_page_desc(addr); -cpu_physical_memory_set_dirty(ram_addr); -n++; +if (bitmap_ul[i] != 0) { +c = le_bswap(bitmap_ul[i], HOST_LONG_BITS); +while (c 0) { +j = ffsl(c) - 1; +c = ~(1ul j); +page_number = i * HOST_LONG_BITS + j; +addr1 = page_number * TARGET_PAGE_SIZE; +addr = offset + addr1; +ram_addr = cpu_get_physical_page_desc(addr); +cpu_physical_memory_set_dirty(ram_addr); +} } } return 0; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/10/2010 02:52 AM, OHMURA Kei wrote: for (i = 0; i len; i++) { -c = bitmap[i]; -while (c 0) { -j = ffsl(c) - 1; -c = ~(1u j); -page_number = i * 8 + j; -addr1 = page_number * TARGET_PAGE_SIZE; -addr = offset + addr1; -ram_addr = cpu_get_physical_page_desc(addr); -cpu_physical_memory_set_dirty(ram_addr); -n++; +if (bitmap_ul[i] != 0) { +c = le_bswap(bitmap_ul[i], HOST_LONG_BITS); +while (c 0) { +j = ffsl(c) - 1; +c = ~(1ul j); +page_number = i * HOST_LONG_BITS + j; +addr1 = page_number * TARGET_PAGE_SIZE; +addr = offset + addr1; +ram_addr = cpu_get_physical_page_desc(addr); +cpu_physical_memory_set_dirty(ram_addr); +} If you're optimizing this code you might want to do it all. The compiler might not see through the bswap call and create unnecessary data dependencies. Especially problematic if the bitmap is really sparse. Also, the outer test is != while the inner test is . Be consistent. I suggest to replace the inner loop with do { ... } while (c != 0); Depending on how sparse the bitmap is populated this might reduce the number of data dependencies quite a bit. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAktysDoACgkQ2ijCOnn/RHS2zwCfcj+G0S5ZAEA8MjGAVI/rKjJJ +0oAnA4njIrwx3/5+o43ekYeYXSNyei0 =ukkz -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/10/2010 12:52 PM, OHMURA Kei wrote: dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c. But We think that dirty-bitmap-traveling by long size is faster than by byte size especially when most of memory is not dirty. --- a/bswap.h +++ b/bswap.h @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #define cpu_to_32wu cpu_to_le32wu #endif -#undef le_bswap #undef be_bswap #undef le_bswaps Anthony, is it okay to export le_bswap this way, or will you want leul_to_cpu()? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/10/2010 07:20 AM, Avi Kivity wrote: On 02/10/2010 12:52 PM, OHMURA Kei wrote: dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c. But We think that dirty-bitmap-traveling by long size is faster than by byte size especially when most of memory is not dirty. --- a/bswap.h +++ b/bswap.h @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #define cpu_to_32wu cpu_to_le32wu #endif -#undef le_bswap #undef be_bswap #undef le_bswaps Anthony, is it okay to export le_bswap this way, or will you want leul_to_cpu()? kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed that when we're using kvm, target byte order == host byte order. So is it really necessary to use a byte swapping function at all? Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/10/2010 07:20 AM, Avi Kivity wrote: On 02/10/2010 12:52 PM, OHMURA Kei wrote: dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c. But We think that dirty-bitmap-traveling by long size is faster than by byte size especially when most of memory is not dirty. --- a/bswap.h +++ b/bswap.h @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #define cpu_to_32wu cpu_to_le32wu #endif -#undef le_bswap #undef be_bswap #undef le_bswaps Anthony, is it okay to export le_bswap this way, or will you want leul_to_cpu()? Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more sense. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/10/2010 05:54 PM, Anthony Liguori wrote: On 02/10/2010 07:20 AM, Avi Kivity wrote: On 02/10/2010 12:52 PM, OHMURA Kei wrote: dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c. But We think that dirty-bitmap-traveling by long size is faster than by byte size especially when most of memory is not dirty. --- a/bswap.h +++ b/bswap.h @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #define cpu_to_32wu cpu_to_le32wu #endif -#undef le_bswap #undef be_bswap #undef le_bswaps Anthony, is it okay to export le_bswap this way, or will you want leul_to_cpu()? kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed that when we're using kvm, target byte order == host byte order. So is it really necessary to use a byte swapping function at all? The dirty log bitmap is always little endian. This is so we don't have to depend on sizeof(long) (which can vary between kernel and userspace) or mandate some other access size. (if native endian worked, then the previous byte-based code would have been broken on big endian). Seriously, those who say that big vs little endian is a matter of taste are missing something. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
Anthony Liguori wrote: On 02/10/2010 07:20 AM, Avi Kivity wrote: On 02/10/2010 12:52 PM, OHMURA Kei wrote: dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c. But We think that dirty-bitmap-traveling by long size is faster than by byte size especially when most of memory is not dirty. --- a/bswap.h +++ b/bswap.h @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v) #define cpu_to_32wu cpu_to_le32wu #endif -#undef le_bswap #undef be_bswap #undef le_bswaps Anthony, is it okay to export le_bswap this way, or will you want leul_to_cpu()? kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed that when we're using kvm, target byte order == host byte order. So is it really necessary to use a byte swapping function at all? On PPC the bitmap is Little Endian. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/10/2010 10:00 AM, Alexander Graf wrote: On PPC the bitmap is Little Endian. Out of curiousity, why? It seems like an odd interface. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
Anthony Liguori wrote: On 02/10/2010 10:00 AM, Alexander Graf wrote: On PPC the bitmap is Little Endian. Out of curiousity, why? It seems like an odd interface. Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel. Unlike with x86, there's no real benefit in using 64 bit userspace. So thanks to the nature of big endianness, that breaks our set_bit helpers, because they assume you're using long data types for the bits. While that's no real issue on little endian, since the next int is just the high part of a u64, it messes everything up on ppc. For more details, please just look in the archives on my patches to make it little endian. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/10/2010 06:35 PM, Anthony Liguori wrote: On 02/10/2010 10:00 AM, Alexander Graf wrote: On PPC the bitmap is Little Endian. Out of curiousity, why? It seems like an odd interface. Exactly this issue. If you specify it as unsigned long native endian, there is ambiguity between 32-bit and 64-bit userspace. If you specify it as uint64_t native endian, you have an inefficient implementation on 32-bit userspace. So we went for unsigned byte native endian, which is the same as any size little endian. (well I think the real reason is that it just grew that way out of x86, but the above is quite plausible). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/10/2010 06:43 PM, Alexander Graf wrote: Out of curiousity, why? It seems like an odd interface. Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel. Unlike with x86, there's no real benefit in using 64 bit userspace. btw, does 32-bit ppc qemu support large memory guests? It doesn't on x86, and I don't remember any hacks to support large memory guests elsewhere. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
Avi Kivity wrote: On 02/10/2010 06:43 PM, Alexander Graf wrote: Out of curiousity, why? It seems like an odd interface. Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel. Unlike with x86, there's no real benefit in using 64 bit userspace. btw, does 32-bit ppc qemu support large memory guests? It doesn't on x86, and I don't remember any hacks to support large memory guests elsewhere. It doesn't :-). In fact, the guest we virtualize wouldn't work with 2 GB anyways, because that needs an iommu implementation. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
On 02/10/2010 06:47 PM, Alexander Graf wrote: Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel. Unlike with x86, there's no real benefit in using 64 bit userspace. btw, does 32-bit ppc qemu support large memory guests? It doesn't on x86, and I don't remember any hacks to support large memory guests elsewhere. It doesn't :-). In fact, the guest we virtualize wouldn't work with 2 GB anyways, because that needs an iommu implementation. Oh, so you may want to revisit the there's no real benefit in using 64 bit userspace. Seriously, that looks like a big deficiency. What would it take to implement an iommu? I imagine Anthony's latest patches are a first step in that journey. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling
Avi Kivity wrote: On 02/10/2010 06:47 PM, Alexander Graf wrote: Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel. Unlike with x86, there's no real benefit in using 64 bit userspace. btw, does 32-bit ppc qemu support large memory guests? It doesn't on x86, and I don't remember any hacks to support large memory guests elsewhere. It doesn't :-). In fact, the guest we virtualize wouldn't work with 2 GB anyways, because that needs an iommu implementation. Oh, so you may want to revisit the there's no real benefit in using 64 bit userspace. Well, for normal users they don't. SLES11 is 64-bit only, so we're good on that. But openSUSE uses 32-bit userland. Seriously, that looks like a big deficiency. What would it take to implement an iommu? I imagine Anthony's latest patches are a first step in that journey. All reads/writes from PCI devices would need to go through a wrapper. Maybe we could also define a per-device offset for memory accesses. That way the overhead might be less. Yes, Anthony's patches look like they are a really big step in that direction. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html