Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-18 Thread Alexander Graf

On 18.02.2010, at 06:57, OHMURA Kei wrote:

 We think? I mean - yes, I think so too. But have you actually measured 
 it?
 How much improvement are we talking here?
 Is it still faster when a bswap is involved?
 Thanks for pointing out.
 I will post the data for x86 later.
 However, I don't have a test environment to check the impact of bswap.
 Would you please measure the run time between the following section if 
 possible?
 It'd make more sense to have a real stand alone test program, no?
 I can try to write one today, but I have some really nasty important bugs 
 to fix first.
 
 OK.  I will prepare a test code with sample data.  Since I found a ppc 
 machine around, I will run the code and post the results of
 x86 and ppc.
 
 
 By the way, the following data is a result of x86 measured in QEMU/KVM.  
 This data shows, how many times the function is called (#called), runtime 
 of original function(orig.), runtime of this patch(patch), speedup ratio 
 (ratio).
 That does indeed look promising!
 Thanks for doing this micro-benchmark. I just want to be 100% sure that it 
 doesn't affect performance for big endian badly.
 
 
 I measured runtime of the test code with sample data.  My test environment 
 and results are described below.
 
 x86 Test Environment:
 CPU: 4x Intel Xeon Quad Core 2.66GHz
 Mem size: 6GB
 
 ppc Test Environment:
 CPU: 2x Dual Core PPC970MP
 Mem size: 2GB
 
 The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS
 was live migrating.  To measure the runtime I copied cpu_get_real_ticks() of
 QEMU to my test program.
 
 
 Experimental results:
 Test1: Guest OS read 3GB file, which is bigger than memory.   orig.(msec) 
patch(msec)ratio
 x860.30.16.4 ppc7.92.7
 3.0 
 Test2: Guest OS read/write 3GB file, which is bigger than memory.   
 orig.(msec)patch(msec)ratio
 x8612.0   3.23.7 ppc251.1  123
 2.0 
 
 I also measured the runtime of bswap itself on ppc, and I found it was only 
 just 0.3% ~ 0.7 % of the runtime described above. 

Awesome! Thank you so much for giving actual data to make me feel comfortable 
with it :-).


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread OHMURA Kei

We think? I mean - yes, I think so too. But have you actually measured it?
How much improvement are we talking here?
Is it still faster when a bswap is involved?

Thanks for pointing out.
I will post the data for x86 later.
However, I don't have a test environment to check the impact of bswap.
Would you please measure the run time between the following section if possible?


It'd make more sense to have a real stand alone test program, no?
I can try to write one today, but I have some really nasty important bugs to 
fix first.



OK.  I will prepare a test code with sample data.  
Since I found a ppc machine around, I will run the code and post the results of

x86 and ppc.


By the way, the following data is a result of x86 measured in QEMU/KVM.  

This data shows, how many times the function is called (#called), runtime of 
original function(orig.), runtime of this patch(patch), speedup ratio (ratio).


Test1: Guest OS read 3GB file, which is bigger than memory.
#called orig.(msec) patch(msec) ratio
108 1.1 0.1 7.6
102 1.0 0.1 6.8
132 1.6 0.2 7.1

Test2: Guest OS read/write 3GB file, which is bigger than memory.
#called orig.(msec) patch(msec) ratio
239433  7.7 4.3
210029  7.1 4.1
283240  9.9 4.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread Alexander Graf

On 17.02.2010, at 10:42, OHMURA Kei wrote:

 We think? I mean - yes, I think so too. But have you actually measured 
 it?
 How much improvement are we talking here?
 Is it still faster when a bswap is involved?
 Thanks for pointing out.
 I will post the data for x86 later.
 However, I don't have a test environment to check the impact of bswap.
 Would you please measure the run time between the following section if 
 possible?
 It'd make more sense to have a real stand alone test program, no?
 I can try to write one today, but I have some really nasty important bugs to 
 fix first.
 
 
 OK.  I will prepare a test code with sample data.  Since I found a ppc 
 machine around, I will run the code and post the results of
 x86 and ppc.
 
 
 By the way, the following data is a result of x86 measured in QEMU/KVM.  
 This data shows, how many times the function is called (#called), runtime of 
 original function(orig.), runtime of this patch(patch), speedup ratio (ratio).

That does indeed look promising!

Thanks for doing this micro-benchmark. I just want to be 100% sure that it 
doesn't affect performance for big endian badly.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread Avi Kivity

On 02/17/2010 11:42 AM, OHMURA Kei wrote:
We think? I mean - yes, I think so too. But have you actually 
measured it?

How much improvement are we talking here?
Is it still faster when a bswap is involved?

Thanks for pointing out.
I will post the data for x86 later.
However, I don't have a test environment to check the impact of bswap.
Would you please measure the run time between the following section 
if possible?


It'd make more sense to have a real stand alone test program, no?
I can try to write one today, but I have some really nasty important 
bugs to fix first.



OK.  I will prepare a test code with sample data.  Since I found a ppc 
machine around, I will run the code and post the results of

x86 and ppc.



I've applied the patch - I think the x86 results justify it, and I'll be 
very surprised if ppc doesn't show a similar gain.  Skipping 7 memory 
accesses and 7 tests must be a win.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread Alexander Graf

On 17.02.2010, at 10:47, Avi Kivity wrote:

 On 02/17/2010 11:42 AM, OHMURA Kei wrote:
 We think? I mean - yes, I think so too. But have you actually measured 
 it?
 How much improvement are we talking here?
 Is it still faster when a bswap is involved?
 Thanks for pointing out.
 I will post the data for x86 later.
 However, I don't have a test environment to check the impact of bswap.
 Would you please measure the run time between the following section if 
 possible?
 
 It'd make more sense to have a real stand alone test program, no?
 I can try to write one today, but I have some really nasty important bugs 
 to fix first.
 
 
 OK.  I will prepare a test code with sample data.  Since I found a ppc 
 machine around, I will run the code and post the results of
 x86 and ppc.
 
 
 I've applied the patch - I think the x86 results justify it, and I'll be very 
 surprised if ppc doesn't show a similar gain.  Skipping 7 memory accesses and 
 7 tests must be a win.

Sounds good to me. I don't assume bswap to be horribly slow either. Just want 
to be sure.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-17 Thread OHMURA Kei

We think? I mean - yes, I think so too. But have you actually measured it?
How much improvement are we talking here?
Is it still faster when a bswap is involved?

Thanks for pointing out.
I will post the data for x86 later.
However, I don't have a test environment to check the impact of bswap.
Would you please measure the run time between the following section if possible?

It'd make more sense to have a real stand alone test program, no?
I can try to write one today, but I have some really nasty important bugs to 
fix first.


OK.  I will prepare a test code with sample data.  Since I found a ppc machine 
around, I will run the code and post the results of
x86 and ppc.


By the way, the following data is a result of x86 measured in QEMU/KVM.  
This data shows, how many times the function is called (#called), runtime of original function(orig.), runtime of this patch(patch), speedup ratio (ratio).


That does indeed look promising!

Thanks for doing this micro-benchmark. I just want to be 100% sure that it 
doesn't affect performance for big endian badly.



I measured runtime of the test code with sample data.  My test environment 
and results are described below.


x86 Test Environment:
CPU: 4x Intel Xeon Quad Core 2.66GHz
Mem size: 6GB

ppc Test Environment:
CPU: 2x Dual Core PPC970MP
Mem size: 2GB

The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS
was live migrating.  To measure the runtime I copied cpu_get_real_ticks() of
QEMU to my test program.


Experimental results:
Test1: Guest OS read 3GB file, which is bigger than memory. 
  orig.(msec)patch(msec)ratio
x860.30.16.4 
ppc7.92.73.0 

Test2: Guest OS read/write 3GB file, which is bigger than memory. 
  orig.(msec)patch(msec)ratio
x8612.0   3.23.7 
ppc251.1  1232.0 



I also measured the runtime of bswap itself on ppc, and I found it was only 
just 0.3% ~ 0.7 % of the runtime described above. 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-16 Thread OHMURA Kei

We think? I mean - yes, I think so too. But have you actually measured it?
How much improvement are we talking here?
Is it still faster when a bswap is involved?


Thanks for pointing out.
I will post the data for x86 later.
However, I don't have a test environment to check the impact of bswap.
Would you please measure the run time between the following section if possible?

start -
qemu-kvm.c:

static int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len,
  void *bitmap, void *opaque)
{
   /* warm up each function */
   kvm_get_dirty_pages_log_range(start, bitmap, start, len);
   kvm_get_dirty_pages_log_range_new(start, bitmap, start, len);

   /* measurement */
   int64_t t1, t2;
   t1 = cpu_get_real_ticks();
   kvm_get_dirty_pages_log_range(start, bitmap, start, len);
   t1 = cpu_get_real_ticks() - t1;
   t2 = cpu_get_real_ticks();
   kvm_get_dirty_pages_log_range_new(start, bitmap, start, len);
   t2 = cpu_get_real_ticks() - t2;

   printf(## %zd, %zd\n, t1, t2); fflush(stdout);

   return kvm_get_dirty_pages_log_range_new(start, bitmap, start, len);
}
end -

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-16 Thread Alexander Graf

On 16.02.2010, at 12:16, OHMURA Kei wrote:

 We think? I mean - yes, I think so too. But have you actually measured it?
 How much improvement are we talking here?
 Is it still faster when a bswap is involved?
 
 Thanks for pointing out.
 I will post the data for x86 later.
 However, I don't have a test environment to check the impact of bswap.
 Would you please measure the run time between the following section if 
 possible?

It'd make more sense to have a real stand alone test program, no?
I can try to write one today, but I have some really nasty important bugs to 
fix first.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-15 Thread Alexander Graf

On 15.02.2010, at 07:12, OHMURA Kei wrote:

 dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
 But We think that dirty-bitmap-traveling by long size is faster than by byte

We think? I mean - yes, I think so too. But have you actually measured it? 
How much improvement are we talking here?
Is it still faster when a bswap is involved?

Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-14 Thread Avi Kivity
On 02/12/2010 04:03 AM, OHMURA Kei wrote:
 On 02/11/2010 Anthony Liguori anth...@codemonkey.ws wrote:
   
 Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
 sense.
 
 Maybe I'm missing something here.
 I couldn't find leul_to_cpu(), so have defined it in bswap.h.
 Correct?

 --- a/bswap.h
 +++ b/bswap.h
 @@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
  
  #ifdef HOST_WORDS_BIGENDIAN
  #define cpu_to_32wu cpu_to_be32wu
 +#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
  #else
  #define cpu_to_32wu cpu_to_le32wu
 +#define leul_to_cpu(v) (v)
  #endif



 On 02/10/2010 Ulrich Drepper drep...@redhat.com wrote:
   
 If you're optimizing this code you might want to do it all.  The
 compiler might not see through the bswap call and create unnecessary
 data dependencies.  Especially problematic if the bitmap is really
 sparse.  Also, the outer test is != while the inner test is .  Be
 consistent.  I suggest to replace the inner loop with

  do {
...
  } while (c != 0);

 Depending on how sparse the bitmap is populated this might reduce the
 number of data dependencies quite a bit.
 
 Combining all comments, the code would be like this.
  
  if (bitmap_ul[i] != 0) {
  c = leul_to_cpu(bitmap_ul[i]);
  do {
  j = ffsl(c) - 1;
  c = ~(1ul  j);
  page_number = i * HOST_LONG_BITS + j;
  addr1 = page_number * TARGET_PAGE_SIZE;
  addr = offset + addr1;
  ram_addr = cpu_get_physical_page_desc(addr);
  cpu_physical_memory_set_dirty(ram_addr);
  } while (c != 0);
  }
   

Except you don't need bitmap_ul any more - you can change the type of
the bitmap variable, since all accesses should now be ulongs.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-14 Thread OHMURA Kei
dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
But We think that dirty-bitmap-traveling by long size is faster than by byte
size especially when most of memory is not dirty.

Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 bswap.h|2 ++
 qemu-kvm.c |   31 ---
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/bswap.h b/bswap.h
index 4558704..1f87e6d 100644
--- a/bswap.h
+++ b/bswap.h
@@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 
 #ifdef HOST_WORDS_BIGENDIAN
 #define cpu_to_32wu cpu_to_be32wu
+#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
 #else
 #define cpu_to_32wu cpu_to_le32wu
+#define leul_to_cpu(v) (v)
 #endif
 
 #undef le_bswap
diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..6952aa5 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2434,31 +2434,32 @@ int kvm_physical_memory_set_dirty_tracking(int enable)
 
 /* get kvm's dirty pages bitmap and update qemu's */
 static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
- unsigned char *bitmap,
+ unsigned long *bitmap,
  unsigned long offset,
  unsigned long mem_size)
 {
-unsigned int i, j, n = 0;
-unsigned char c;
-unsigned long page_number, addr, addr1;
+unsigned int i, j;
+unsigned long page_number, addr, addr1, c;
 ram_addr_t ram_addr;
-unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) /
+HOST_LONG_BITS;
 
 /* 
  * bitmap-traveling is faster than memory-traveling (for addr...) 
  * especially when most of the memory is not dirty.
  */
 for (i = 0; i  len; i++) {
-c = bitmap[i];
-while (c  0) {
-j = ffsl(c) - 1;
-c = ~(1u  j);
-page_number = i * 8 + j;
-addr1 = page_number * TARGET_PAGE_SIZE;
-addr = offset + addr1;
-ram_addr = cpu_get_physical_page_desc(addr);
-cpu_physical_memory_set_dirty(ram_addr);
-n++;
+if (bitmap[i] != 0) {
+c = leul_to_cpu(bitmap[i]);
+do {
+j = ffsl(c) - 1;
+c = ~(1ul  j);
+page_number = i * HOST_LONG_BITS + j;
+addr1 = page_number * TARGET_PAGE_SIZE;
+addr = offset + addr1;
+ram_addr = cpu_get_physical_page_desc(addr);
+cpu_physical_memory_set_dirty(ram_addr);
+} while (c != 0);
 }
 }
 return 0;
-- 
1.6.3.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-11 Thread OHMURA Kei
On 02/11/2010 Anthony Liguori anth...@codemonkey.ws wrote:
 Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
 sense.

Maybe I'm missing something here.
I couldn't find leul_to_cpu(), so have defined it in bswap.h.
Correct?

--- a/bswap.h
+++ b/bswap.h
@@ -205,8 +205,10 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 
 #ifdef HOST_WORDS_BIGENDIAN
 #define cpu_to_32wu cpu_to_be32wu
+#define leul_to_cpu(v) le ## HOST_LONG_BITS ## _to_cpu(v)
 #else
 #define cpu_to_32wu cpu_to_le32wu
+#define leul_to_cpu(v) (v)
 #endif



On 02/10/2010 Ulrich Drepper drep...@redhat.com wrote:
 If you're optimizing this code you might want to do it all.  The
 compiler might not see through the bswap call and create unnecessary
 data dependencies.  Especially problematic if the bitmap is really
 sparse.  Also, the outer test is != while the inner test is .  Be
 consistent.  I suggest to replace the inner loop with
 
  do {
...
  } while (c != 0);
 
 Depending on how sparse the bitmap is populated this might reduce the
 number of data dependencies quite a bit.

Combining all comments, the code would be like this.
 
 if (bitmap_ul[i] != 0) {
 c = leul_to_cpu(bitmap_ul[i]);
 do {
 j = ffsl(c) - 1;
 c = ~(1ul  j);
 page_number = i * HOST_LONG_BITS + j;
 addr1 = page_number * TARGET_PAGE_SIZE;
 addr = offset + addr1;
 ram_addr = cpu_get_physical_page_desc(addr);
 cpu_physical_memory_set_dirty(ram_addr);
 } while (c != 0);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread OHMURA Kei
dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
But We think that dirty-bitmap-traveling by long size is faster than by byte
size especially when most of memory is not dirty.

Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 bswap.h|1 -
 qemu-kvm.c |   30 --
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/bswap.h b/bswap.h
index 4558704..d896f01 100644
--- a/bswap.h
+++ b/bswap.h
@@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
 #define cpu_to_32wu cpu_to_le32wu
 #endif
 
-#undef le_bswap
 #undef be_bswap
 #undef le_bswaps
 #undef be_bswaps
diff --git a/qemu-kvm.c b/qemu-kvm.c
index a305907..ea07912 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2438,27 +2438,29 @@ static int kvm_get_dirty_pages_log_range(unsigned long 
start_addr,
  unsigned long offset,
  unsigned long mem_size)
 {
-unsigned int i, j, n = 0;
-unsigned char c;
-unsigned long page_number, addr, addr1;
+unsigned int i, j;
+unsigned long page_number, addr, addr1, c;
 ram_addr_t ram_addr;
-unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + 7) / 8;
+unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) /
+HOST_LONG_BITS;
+unsigned long *bitmap_ul = (unsigned long *)bitmap;
 
 /* 
  * bitmap-traveling is faster than memory-traveling (for addr...) 
  * especially when most of the memory is not dirty.
  */
 for (i = 0; i  len; i++) {
-c = bitmap[i];
-while (c  0) {
-j = ffsl(c) - 1;
-c = ~(1u  j);
-page_number = i * 8 + j;
-addr1 = page_number * TARGET_PAGE_SIZE;
-addr = offset + addr1;
-ram_addr = cpu_get_physical_page_desc(addr);
-cpu_physical_memory_set_dirty(ram_addr);
-n++;
+if (bitmap_ul[i] != 0) {
+c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
+while (c  0) {
+j = ffsl(c) - 1;
+c = ~(1ul  j);
+page_number = i * HOST_LONG_BITS + j;
+addr1 = page_number * TARGET_PAGE_SIZE;
+addr = offset + addr1;
+ram_addr = cpu_get_physical_page_desc(addr);
+cpu_physical_memory_set_dirty(ram_addr);
+}
 }
 }
 return 0;
-- 
1.6.3.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/10/2010 02:52 AM, OHMURA Kei wrote:

  for (i = 0; i  len; i++) {
 -c = bitmap[i];
 -while (c  0) {
 -j = ffsl(c) - 1;
 -c = ~(1u  j);
 -page_number = i * 8 + j;
 -addr1 = page_number * TARGET_PAGE_SIZE;
 -addr = offset + addr1;
 -ram_addr = cpu_get_physical_page_desc(addr);
 -cpu_physical_memory_set_dirty(ram_addr);
 -n++;
 +if (bitmap_ul[i] != 0) {
 +c = le_bswap(bitmap_ul[i], HOST_LONG_BITS);
 +while (c  0) {
 +j = ffsl(c) - 1;
 +c = ~(1ul  j);
 +page_number = i * HOST_LONG_BITS + j;
 +addr1 = page_number * TARGET_PAGE_SIZE;
 +addr = offset + addr1;
 +ram_addr = cpu_get_physical_page_desc(addr);
 +cpu_physical_memory_set_dirty(ram_addr);
 +}

If you're optimizing this code you might want to do it all.  The
compiler might not see through the bswap call and create unnecessary
data dependencies.  Especially problematic if the bitmap is really
sparse.  Also, the outer test is != while the inner test is .  Be
consistent.  I suggest to replace the inner loop with

  do {
...
  } while (c != 0);

Depending on how sparse the bitmap is populated this might reduce the
number of data dependencies quite a bit.

- -- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAktysDoACgkQ2ijCOnn/RHS2zwCfcj+G0S5ZAEA8MjGAVI/rKjJJ
+0oAnA4njIrwx3/5+o43ekYeYXSNyei0
=ukkz
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Avi Kivity
On 02/10/2010 12:52 PM, OHMURA Kei wrote:
 dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
 But We think that dirty-bitmap-traveling by long size is faster than by byte
 size especially when most of memory is not dirty.

 --- a/bswap.h
 +++ b/bswap.h
 @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
  #define cpu_to_32wu cpu_to_le32wu
  #endif
  
 -#undef le_bswap
  #undef be_bswap
  #undef le_bswaps
   


Anthony, is it okay to export le_bswap this way, or will you want
leul_to_cpu()?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Anthony Liguori
On 02/10/2010 07:20 AM, Avi Kivity wrote:
 On 02/10/2010 12:52 PM, OHMURA Kei wrote:
   
 dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
 But We think that dirty-bitmap-traveling by long size is faster than by byte
 size especially when most of memory is not dirty.

 --- a/bswap.h
 +++ b/bswap.h
 @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
  #define cpu_to_32wu cpu_to_le32wu
  #endif
  
 -#undef le_bswap
  #undef be_bswap
  #undef le_bswaps
   
 

 Anthony, is it okay to export le_bswap this way, or will you want
 leul_to_cpu()?
   

kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed
that when we're using kvm, target byte order == host byte order.

So is it really necessary to use a byte swapping function at all?

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Anthony Liguori
On 02/10/2010 07:20 AM, Avi Kivity wrote:
 On 02/10/2010 12:52 PM, OHMURA Kei wrote:
   
 dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
 But We think that dirty-bitmap-traveling by long size is faster than by byte
 size especially when most of memory is not dirty.

 --- a/bswap.h
 +++ b/bswap.h
 @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t v)
  #define cpu_to_32wu cpu_to_le32wu
  #endif
  
 -#undef le_bswap
  #undef be_bswap
  #undef le_bswaps
   
 

 Anthony, is it okay to export le_bswap this way, or will you want
 leul_to_cpu()?
   

Oh, I see what's happening here. Yes, I think a leul_to_cpu() makes more
sense.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Avi Kivity
On 02/10/2010 05:54 PM, Anthony Liguori wrote:
 On 02/10/2010 07:20 AM, Avi Kivity wrote:
   
 On 02/10/2010 12:52 PM, OHMURA Kei wrote:
   
 
 dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
 But We think that dirty-bitmap-traveling by long size is faster than by byte
 size especially when most of memory is not dirty.

 --- a/bswap.h
 +++ b/bswap.h
 @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t 
 v)
  #define cpu_to_32wu cpu_to_le32wu
  #endif
  
 -#undef le_bswap
  #undef be_bswap
  #undef le_bswaps
   
 
   
 Anthony, is it okay to export le_bswap this way, or will you want
 leul_to_cpu()?
   
 
 kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed
 that when we're using kvm, target byte order == host byte order.

 So is it really necessary to use a byte swapping function at all?
   

The dirty log bitmap is always little endian. This is so we don't have
to depend on sizeof(long) (which can vary between kernel and userspace)
or mandate some other access size.

(if native endian worked, then the previous byte-based code would have
been broken on big endian).

Seriously, those who say that big vs little endian is a matter of taste
are missing something.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Alexander Graf
Anthony Liguori wrote:
 On 02/10/2010 07:20 AM, Avi Kivity wrote:
   
 On 02/10/2010 12:52 PM, OHMURA Kei wrote:
   
 
 dirty-bitmap-traveling is carried out by byte size in qemu-kvm.c.
 But We think that dirty-bitmap-traveling by long size is faster than by byte
 size especially when most of memory is not dirty.

 --- a/bswap.h
 +++ b/bswap.h
 @@ -209,7 +209,6 @@ static inline void cpu_to_be32wu(uint32_t *p, uint32_t 
 v)
  #define cpu_to_32wu cpu_to_le32wu
  #endif
  
 -#undef le_bswap
  #undef be_bswap
  #undef le_bswaps
   
 
   
 Anthony, is it okay to export le_bswap this way, or will you want
 leul_to_cpu()?
   
 

 kvm_get_dirty_pages_log_range() is kvm-specific code. We're guaranteed
 that when we're using kvm, target byte order == host byte order.

 So is it really necessary to use a byte swapping function at all?
   

On PPC the bitmap is Little Endian.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Anthony Liguori
On 02/10/2010 10:00 AM, Alexander Graf wrote:
 On PPC the bitmap is Little Endian.
   

Out of curiousity, why? It seems like an odd interface.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Alexander Graf
Anthony Liguori wrote:
 On 02/10/2010 10:00 AM, Alexander Graf wrote:
   
 On PPC the bitmap is Little Endian.
   
 

 Out of curiousity, why? It seems like an odd interface.
   

Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
Unlike with x86, there's no real benefit in using 64 bit userspace.

So thanks to the nature of big endianness, that breaks our set_bit
helpers, because they assume you're using long data types for the
bits. While that's no real issue on little endian, since the next int is
just the high part of a u64, it messes everything up on ppc.

For more details, please just look in the archives on my patches to make
it little endian.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Avi Kivity
On 02/10/2010 06:35 PM, Anthony Liguori wrote:
 On 02/10/2010 10:00 AM, Alexander Graf wrote:
   
 On PPC the bitmap is Little Endian.
   
 
 Out of curiousity, why? It seems like an odd interface.

   

Exactly this issue. If you specify it as unsigned long native endian,
there is ambiguity between 32-bit and 64-bit userspace. If you specify
it as uint64_t native endian, you have an inefficient implementation on
32-bit userspace. So we went for unsigned byte native endian, which is
the same as any size little endian.

(well I think the real reason is that it just grew that way out of x86,
but the above is quite plausible).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Avi Kivity
On 02/10/2010 06:43 PM, Alexander Graf wrote:

 Out of curiousity, why? It seems like an odd interface.
   
 
 Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
 Unlike with x86, there's no real benefit in using 64 bit userspace.
   

btw, does 32-bit ppc qemu support large memory guests? It doesn't on
x86, and I don't remember any hacks to support large memory guests
elsewhere.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Alexander Graf
Avi Kivity wrote:
 On 02/10/2010 06:43 PM, Alexander Graf wrote:
   
 Out of curiousity, why? It seems like an odd interface.
   
 
   
 Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
 Unlike with x86, there's no real benefit in using 64 bit userspace.
   
 

 btw, does 32-bit ppc qemu support large memory guests? It doesn't on
 x86, and I don't remember any hacks to support large memory guests
 elsewhere.
   


It doesn't :-). In fact, the guest we virtualize wouldn't work with  2
GB anyways, because that needs an iommu implementation.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Avi Kivity
On 02/10/2010 06:47 PM, Alexander Graf wrote:
 Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
 Unlike with x86, there's no real benefit in using 64 bit userspace.
   
 
   
 btw, does 32-bit ppc qemu support large memory guests? It doesn't on
 x86, and I don't remember any hacks to support large memory guests
 elsewhere.
   
 

 It doesn't :-). In fact, the guest we virtualize wouldn't work with  2
 GB anyways, because that needs an iommu implementation.
   

Oh, so you may want to revisit the there's no real benefit in using 64
bit userspace.

Seriously, that looks like a big deficiency. What would it take to
implement an iommu?

I imagine Anthony's latest patches are a first step in that journey.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-10 Thread Alexander Graf
Avi Kivity wrote:
 On 02/10/2010 06:47 PM, Alexander Graf wrote:
   
 Because on PPC, you usually run PPC32 userspace code on a PPC64 kernel.
 Unlike with x86, there's no real benefit in using 64 bit userspace.
   
 
   
 
 btw, does 32-bit ppc qemu support large memory guests? It doesn't on
 x86, and I don't remember any hacks to support large memory guests
 elsewhere.
   
 
   
 It doesn't :-). In fact, the guest we virtualize wouldn't work with  2
 GB anyways, because that needs an iommu implementation.
   
 

 Oh, so you may want to revisit the there's no real benefit in using 64
 bit userspace.
   

Well, for normal users they don't. SLES11 is 64-bit only, so we're good
on that. But openSUSE uses 32-bit userland.

 Seriously, that looks like a big deficiency. What would it take to
 implement an iommu?

 I imagine Anthony's latest patches are a first step in that journey.
   

All reads/writes from PCI devices would need to go through a wrapper.
Maybe we could also define a per-device offset for memory accesses. That
way the overhead might be less.

Yes, Anthony's patches look like they are a really big step in that
direction.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html