Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-14 Thread Avi Kivity
On 02/10/2012 03:25 PM, Takuya Yoshikawa wrote: Avi Kivity a...@redhat.com wrote: 2. When we create(and shift?) a memory slot, we call kvm_arch_flush_shadow() to clear all mmio sptes, again not restricted to that slot. /* * If the new memory slot is created, we need to

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-14 Thread Avi Kivity
On 02/10/2012 07:16 PM, Marcelo Tosatti wrote: On Thu, Feb 09, 2012 at 04:25:36PM +0200, Avi Kivity wrote: On 02/08/2012 08:45 PM, Marcelo Tosatti wrote: BTW do we really need fast slot creation/destruction? At the moment yes. Boot a RHEL/Fedora installation disk (or any other

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-10 Thread Takuya Yoshikawa
Avi Kivity a...@redhat.com wrote: On 02/09/2012 04:23 PM, Avi Kivity wrote: BTW do we really need fast slot creation/destruction? Not really, but it's good to have infrastructure that copes with different workloads. If the patches keep the code simple I think it's a good thing to

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-10 Thread Takuya Yoshikawa
Avi Kivity a...@redhat.com wrote: 2. When we create(and shift?) a memory slot, we call kvm_arch_flush_shadow() to clear all mmio sptes, again not restricted to that slot. /* * If the new memory slot is created, we need to clear all * mmio sptes. */ if

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-10 Thread Marcelo Tosatti
On Thu, Feb 09, 2012 at 04:25:36PM +0200, Avi Kivity wrote: On 02/08/2012 08:45 PM, Marcelo Tosatti wrote: BTW do we really need fast slot creation/destruction? At the moment yes. Boot a RHEL/Fedora installation disk (or any other guest which uses SYSLINUX splash screen) and you will

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-10 Thread Marcelo Tosatti
On Fri, Feb 10, 2012 at 10:08:12PM +0900, Takuya Yoshikawa wrote: Avi Kivity a...@redhat.com wrote: On 02/09/2012 04:23 PM, Avi Kivity wrote: BTW do we really need fast slot creation/destruction? Not really, but it's good to have infrastructure that copes with different

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-09 Thread Takuya Yoshikawa
On Wed, 8 Feb 2012 16:45:31 -0200 Marcelo Tosatti mtosa...@redhat.com wrote: For 3: I think doing both write protection and shadow flush is unnecessary. If you enable dirty logging on a slot, certainly you have to write protect? When we enable dirty logging, yes. BTW do we really

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-09 Thread Avi Kivity
On 02/08/2012 05:43 PM, Takuya Yoshikawa wrote: [Dropped non-kvm members from cc] Marcelo Tosatti mtosa...@redhat.com wrote: VGABIOS mode constantly destroys and creates 0xa slot, so performance is required for KVM_SET_MEM too (it can probably be fixed in qemu, but older qemu's must

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-09 Thread Avi Kivity
On 02/09/2012 04:23 PM, Avi Kivity wrote: BTW do we really need fast slot creation/destruction? Not really, but it's good to have infrastructure that copes with different workloads. If the patches keep the code simple I think it's a good thing to have. To qualify - taking several tens of

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-09 Thread Avi Kivity
On 02/08/2012 08:45 PM, Marcelo Tosatti wrote: BTW do we really need fast slot creation/destruction? At the moment yes. Boot a RHEL/Fedora installation disk (or any other guest which uses SYSLINUX splash screen) and you will see. Another workload that suffers is Windows XP clearing the

[RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-08 Thread Takuya Yoshikawa
[Dropped non-kvm members from cc] Marcelo Tosatti mtosa...@redhat.com wrote: VGABIOS mode constantly destroys and creates 0xa slot, so performance is required for KVM_SET_MEM too (it can probably be fixed in qemu, but older qemu's must be supported). Apart from srcu, I see some problems

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-08 Thread Marcelo Tosatti
On Thu, Feb 09, 2012 at 12:43:20AM +0900, Takuya Yoshikawa wrote: [Dropped non-kvm members from cc] Marcelo Tosatti mtosa...@redhat.com wrote: VGABIOS mode constantly destroys and creates 0xa slot, so performance is required for KVM_SET_MEM too (it can probably be fixed in qemu,

Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Avi Kivity
On 02/02/2012 07:46 AM, Takuya Yoshikawa wrote: Avi Kivity a...@redhat.com wrote: That'll be great, numbers are better than speculation. Yes, I already have some good numbers to show (and some patches). Looking forward. I made a patch to see if Avi's suggestion of getting

Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Takuya Yoshikawa
(2012/02/02 19:10), Avi Kivity wrote: = # of dirty pages: kvm.git (ns), with this patch (ns) 1: 102,077 ns 10,105 ns 2: 47,197 ns 9,395 ns 4: 43,563 ns 9,938 ns 8: 41,239 ns

Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Avi Kivity
On 02/02/2012 12:21 PM, Takuya Yoshikawa wrote: (2012/02/02 19:10), Avi Kivity wrote: = # of dirty pages: kvm.git (ns), with this patch (ns) 1: 102,077 ns 10,105 ns 2: 47,197 ns 9,395 ns 4:

Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Takuya Yoshikawa
(2012/02/02 19:21), Avi Kivity wrote: I used unsigned int just because I wanted to use the current atomic_clear_mask() as is. We need to implement atomic_clear_mask_long() or use ... If we use cmpxchg8b/cmpxchg16b then this won't fit with the atomic_*_long family. OK, I will try. I

Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Avi Kivity
On 02/02/2012 12:40 PM, Takuya Yoshikawa wrote: I have one concern about correctness issue though: concurrent rmap write protection may not be safe due to delayed tlb flush ... cannot happen? What do you mean by concurrent rmap write protection? -- error compiling committee.c: too

Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Takuya Yoshikawa
Avi Kivity a...@redhat.com wrote: I have one concern about correctness issue though: concurrent rmap write protection may not be safe due to delayed tlb flush ... cannot happen? What do you mean by concurrent rmap write protection? Not sure, but other codes like: -

Re: [test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-02 Thread Avi Kivity
On 02/02/2012 04:44 PM, Takuya Yoshikawa wrote: Avi Kivity a...@redhat.com wrote: I have one concern about correctness issue though: concurrent rmap write protection may not be safe due to delayed tlb flush ... cannot happen? What do you mean by concurrent rmap write

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Peter Zijlstra
On Tue, 2012-01-31 at 14:24 -0800, Paul E. McKenney wrote: Can we get it back to speed by scheduling a work function on all cpus? wouldn't that force a quiescent state and allow call_srcu() to fire? In kvm's use case synchronize_srcu_expedited() is usually called when no thread

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Avi Kivity
On 02/01/2012 12:22 PM, Peter Zijlstra wrote: One of the things I was thinking of is adding a sequence counter in the per-cpu data. Using that we could do something like: unsigned int seq1 = 0, seq2 = 0, count = 0; int cpu, idx; idx = ACCESS_ONCE(sp-completions) 1;

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Avi Kivity
On 02/01/2012 12:44 PM, Avi Kivity wrote: On 02/01/2012 12:22 PM, Peter Zijlstra wrote: One of the things I was thinking of is adding a sequence counter in the per-cpu data. Using that we could do something like: unsigned int seq1 = 0, seq2 = 0, count = 0; int cpu, idx; idx =

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Takuya Yoshikawa
(2012/02/01 19:49), Avi Kivity wrote: On 02/01/2012 12:44 PM, Avi Kivity wrote: On 02/01/2012 12:22 PM, Peter Zijlstra wrote: One of the things I was thinking of is adding a sequence counter in the per-cpu data. Using that we could do something like: unsigned int seq1 = 0, seq2 = 0, count

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Avi Kivity
On 02/01/2012 01:00 PM, Takuya Yoshikawa wrote: rcu_assign_pointer), and use atomic operations to copy and clear: word = bitmap[i] put_user(word) atomic_and(bitmap[i], ~word) This kind of this was really slow IIRC. How about just doing: take a spin_lock copy the entire (or

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Takuya Yoshikawa
(2012/02/01 20:01), Avi Kivity wrote: On 02/01/2012 01:00 PM, Takuya Yoshikawa wrote: How about just doing: take a spin_lock copy the entire (or some portions of) bitmap locally clear the bitmap unlock That means that vcpus dirtying memory also have to take that lock, and spin while the

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Avi Kivity
On 02/01/2012 01:12 PM, Takuya Yoshikawa wrote: (2012/02/01 20:01), Avi Kivity wrote: On 02/01/2012 01:00 PM, Takuya Yoshikawa wrote: How about just doing: take a spin_lock copy the entire (or some portions of) bitmap locally clear the bitmap unlock That means that vcpus dirtying

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Marcelo Tosatti
On Wed, Feb 01, 2012 at 12:49:57PM +0200, Avi Kivity wrote: On 02/01/2012 12:44 PM, Avi Kivity wrote: On 02/01/2012 12:22 PM, Peter Zijlstra wrote: One of the things I was thinking of is adding a sequence counter in the per-cpu data. Using that we could do something like: unsigned

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Marcelo Tosatti
On Wed, Feb 01, 2012 at 01:01:38PM +0200, Avi Kivity wrote: On 02/01/2012 01:00 PM, Takuya Yoshikawa wrote: rcu_assign_pointer), and use atomic operations to copy and clear: word = bitmap[i] put_user(word) atomic_and(bitmap[i], ~word) This kind of this was really slow

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Paul E. McKenney
On Wed, Feb 01, 2012 at 11:22:29AM +0100, Peter Zijlstra wrote: On Tue, 2012-01-31 at 14:24 -0800, Paul E. McKenney wrote: Can we get it back to speed by scheduling a work function on all cpus? wouldn't that force a quiescent state and allow call_srcu() to fire? In kvm's use

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Takuya Yoshikawa
On Wed, 1 Feb 2012 11:43:47 -0200 Marcelo Tosatti mtosa...@redhat.com wrote: I can show you some performance numbers, this weekend, if you like. That'll be great, numbers are better than speculation. get dirty log:5634134 ns for 262144 dirty pages 5ms (for the entire

[test result] dirty logging without srcu update -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-01 Thread Takuya Yoshikawa
Avi Kivity a...@redhat.com wrote: That'll be great, numbers are better than speculation. Yes, I already have some good numbers to show (and some patches). Looking forward. I made a patch to see if Avi's suggestion of getting rid of srcu update for dirty logging is practical; tested

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-01-31 Thread Avi Kivity
On 01/31/2012 03:32 PM, Peter Zijlstra wrote: Subject: srcu: Implement call_srcu() From: Peter Zijlstra a.p.zijls...@chello.nl Date: Mon Jan 30 23:20:49 CET 2012 Implement call_srcu() by using a state machine driven by call_rcu_sched() and timer callbacks. The state machine is a direct

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-01-31 Thread Peter Zijlstra
On Tue, 2012-01-31 at 15:47 +0200, Avi Kivity wrote: They really need to return quickly to userspace, and they really need to perform some operation between rcu_assign_pointer() and returning, so no. Bugger :/ Compile tested only!! :-) How much did synchronize_srcu_expedited()

Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-01-31 Thread Paul E. McKenney
On Tue, Jan 31, 2012 at 02:50:07PM +0100, Peter Zijlstra wrote: On Tue, 2012-01-31 at 15:47 +0200, Avi Kivity wrote: They really need to return quickly to userspace, and they really need to perform some operation between rcu_assign_pointer() and returning, so no. Bugger :/