Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-02-26 Thread Andy Lutomirski
On Thu, Jan 8, 2015 at 2:43 PM, Andy Lutomirski l...@amacapital.net wrote: On Thu, Jan 8, 2015 at 2:31 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 11:49:09AM -0800, Andy Lutomirski wrote: On Tue, Jan 6, 2015 at 10:45 AM, Marcelo Tosatti mtosa...@redhat.com wrote:

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-08 Thread Marcelo Tosatti
On Tue, Jan 06, 2015 at 11:49:09AM -0800, Andy Lutomirski wrote: On Tue, Jan 6, 2015 at 10:45 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 10:26:22AM -0800, Andy Lutomirski wrote: On Tue, Jan 6, 2015 at 10:13 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-08 Thread Andy Lutomirski
On Thu, Jan 8, 2015 at 2:31 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 11:49:09AM -0800, Andy Lutomirski wrote: On Tue, Jan 6, 2015 at 10:45 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 10:26:22AM -0800, Andy Lutomirski wrote: On Tue, Jan

Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-08 Thread David Vrabel
On 23/12/2014 00:39, Andy Lutomirski wrote: The pvclock vdso code was too abstracted to understand easily and excessively paranoid. Simplify it for a huge speedup. This opens the door for additional simplifications, as the vdso no longer accesses the pvti for any vcpu other than vcpu 0.

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-07 Thread Paolo Bonzini
On 07/01/2015 08:18, Andy Lutomirski wrote: Thus far, I've been told unambiguously that a guest can't observe pvti while it's being written, and I think you're now telling me that this isn't true and that a guest *can* observe pvti while it's being written while the low bit of the

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-07 Thread Marcelo Tosatti
On Tue, Jan 06, 2015 at 11:18:21PM -0800, Andy Lutomirski wrote: On Tue, Jan 6, 2015 at 9:38 PM, Paolo Bonzini pbonz...@redhat.com wrote: On 06/01/2015 17:56, Andy Lutomirski wrote: Still no good. We can migrate a bunch of times so we see the same CPU all three times There are no

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Paolo Bonzini
On 05/01/2015 20:17, Marcelo Tosatti wrote: But there is no guarantee that vCPU-N has updated its pvti when vCPU-M resumes guest instruction execution. You're right. So the cost this patch removes is mainly from __getcpu (==RDTSCP?) ? Perhaps you can use Gleb's idea to stick vcpu id into

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Paolo Bonzini
On 05/01/2015 23:48, Marcelo Tosatti wrote: But there is no guarantee that vCPU-N has updated its pvti when vCPU-M resumes guest instruction execution. Still confused. So we can freeze all vCPUs in the host, then update pvti 1, then resume vCPU 1, then update pvti 0? In that case,

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Paolo Bonzini
On 06/01/2015 09:42, Paolo Bonzini wrote: Still confused. So we can freeze all vCPUs in the host, then update pvti 1, then resume vCPU 1, then update pvti 0? In that case, we have a problem, because vCPU 1 can observe pvti 0 mid-update, and KVM doesn't increment the version

Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Konrad Rzeszutek Wilk
On Mon, Jan 05, 2015 at 10:56:07AM -0800, Andy Lutomirski wrote: On Mon, Jan 5, 2015 at 7:25 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Dec 22, 2014 at 04:39:57PM -0800, Andy Lutomirski wrote: The pvclock vdso code was too abstracted to understand easily and excessively

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Marcelo Tosatti
On Tue, Jan 06, 2015 at 10:26:22AM -0800, Andy Lutomirski wrote: On Tue, Jan 6, 2015 at 10:13 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 08:56:40AM -0800, Andy Lutomirski wrote: On Jan 6, 2015 4:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: On

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Andy Lutomirski
On Tue, Jan 6, 2015 at 10:45 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 10:26:22AM -0800, Andy Lutomirski wrote: On Tue, Jan 6, 2015 at 10:13 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 08:56:40AM -0800, Andy Lutomirski wrote: On Jan 6,

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Marcelo Tosatti
On Tue, Jan 06, 2015 at 11:49:09AM -0800, Andy Lutomirski wrote: What is the point with the new flags bit though? To try to work around the problem on old hosts. I'm not at all convinced that this is worthwhile or that it helps, though. Don't think so. Just fix the host bug. Also, if

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Andy Lutomirski
On Tue, Jan 6, 2015 at 12:20 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 11:49:09AM -0800, Andy Lutomirski wrote: What is the point with the new flags bit though? To try to work around the problem on old hosts. I'm not at all convinced that this is worthwhile or

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Paolo Bonzini
On 06/01/2015 17:56, Andy Lutomirski wrote: Still no good. We can migrate a bunch of times so we see the same CPU all three times There are no three times. The CPU you see here: // ... compute nanoseconds from pvti and tsc ... rmb(); } while(v != pvti-version); is the

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Paolo Bonzini
On 06/01/2015 19:26, Andy Lutomirski wrote: Don't you stil need: version++; write the rest; version++; with possible smp_wmb() in there to keep the compiler from messing around? No, see my other reply. Separating the version write is a real bug, but that should be all that it's

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Andy Lutomirski
On Tue, Jan 6, 2015 at 9:38 PM, Paolo Bonzini pbonz...@redhat.com wrote: On 06/01/2015 17:56, Andy Lutomirski wrote: Still no good. We can migrate a bunch of times so we see the same CPU all three times There are no three times. The CPU you see here: // ... compute nanoseconds

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Andy Lutomirski
On Tue, Jan 6, 2015 at 10:13 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jan 06, 2015 at 08:56:40AM -0800, Andy Lutomirski wrote: On Jan 6, 2015 4:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 06/01/2015 09:42, Paolo Bonzini wrote: Still confused. So we can freeze

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Andy Lutomirski
On Jan 6, 2015 4:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 06/01/2015 09:42, Paolo Bonzini wrote: Still confused. So we can freeze all vCPUs in the host, then update pvti 1, then resume vCPU 1, then update pvti 0? In that case, we have a problem, because vCPU 1 can

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Marcelo Tosatti
On Tue, Jan 06, 2015 at 08:56:40AM -0800, Andy Lutomirski wrote: On Jan 6, 2015 4:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 06/01/2015 09:42, Paolo Bonzini wrote: Still confused. So we can freeze all vCPUs in the host, then update pvti 1, then resume vCPU 1, then

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-05 Thread Marcelo Tosatti
On Mon, Jan 05, 2015 at 10:56:07AM -0800, Andy Lutomirski wrote: On Mon, Jan 5, 2015 at 7:25 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Dec 22, 2014 at 04:39:57PM -0800, Andy Lutomirski wrote: The pvclock vdso code was too abstracted to understand easily and excessively

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-05 Thread Andy Lutomirski
On Mon, Jan 5, 2015 at 11:17 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jan 05, 2015 at 10:56:07AM -0800, Andy Lutomirski wrote: On Mon, Jan 5, 2015 at 7:25 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Dec 22, 2014 at 04:39:57PM -0800, Andy Lutomirski wrote: The pvclock

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-05 Thread Andy Lutomirski
On Mon, Jan 5, 2015 at 2:48 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jan 05, 2015 at 02:38:46PM -0800, Andy Lutomirski wrote: On Mon, Jan 5, 2015 at 11:17 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jan 05, 2015 at 10:56:07AM -0800, Andy Lutomirski wrote: On Mon, Jan

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-05 Thread Paolo Bonzini
On 05/01/2015 19:56, Andy Lutomirski wrote: 1) State: all pvtis marked as PVCLOCK_TSC_STABLE_BIT. 1) Update request for all vcpus, for a TSC_STABLE_BIT - ~TSC_STABLE_BIT transition. 2) vCPU-1 updates its pvti with new values. 3) vCPU-0 still has not updated its pvti with new values.

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-05 Thread Marcelo Tosatti
On Mon, Jan 05, 2015 at 02:38:46PM -0800, Andy Lutomirski wrote: On Mon, Jan 5, 2015 at 11:17 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jan 05, 2015 at 10:56:07AM -0800, Andy Lutomirski wrote: On Mon, Jan 5, 2015 at 7:25 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-05 Thread Andy Lutomirski
On Mon, Jan 5, 2015 at 7:25 AM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Dec 22, 2014 at 04:39:57PM -0800, Andy Lutomirski wrote: The pvclock vdso code was too abstracted to understand easily and excessively paranoid. Simplify it for a huge speedup. This opens the door for

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-05 Thread Marcelo Tosatti
On Mon, Dec 22, 2014 at 04:39:57PM -0800, Andy Lutomirski wrote: The pvclock vdso code was too abstracted to understand easily and excessively paranoid. Simplify it for a huge speedup. This opens the door for additional simplifications, as the vdso no longer accesses the pvti for any vcpu

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2014-12-24 Thread David Matlack
On Mon, Dec 22, 2014 at 4:39 PM, Andy Lutomirski l...@amacapital.net wrote: The pvclock vdso code was too abstracted to understand easily and excessively paranoid. Simplify it for a huge speedup. This opens the door for additional simplifications, as the vdso no longer accesses the pvti for

Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2014-12-24 Thread Andy Lutomirski
On Wed, Dec 24, 2014 at 1:30 PM, David Matlack dmatl...@google.com wrote: On Mon, Dec 22, 2014 at 4:39 PM, Andy Lutomirski l...@amacapital.net wrote: The pvclock vdso code was too abstracted to understand easily and excessively paranoid. Simplify it for a huge speedup. This opens the door

Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2014-12-23 Thread David Vrabel
On 23/12/14 00:39, Andy Lutomirski wrote: The pvclock vdso code was too abstracted to understand easily and excessively paranoid. Simplify it for a huge speedup. This opens the door for additional simplifications, as the vdso no longer accesses the pvti for any vcpu other than vcpu 0.

Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2014-12-23 Thread Boris Ostrovsky
On 12/22/2014 07:39 PM, Andy Lutomirski wrote: The pvclock vdso code was too abstracted to understand easily and excessively paranoid. Simplify it for a huge speedup. This opens the door for additional simplifications, as the vdso no longer accesses the pvti for any vcpu other than vcpu 0.

Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2014-12-23 Thread Paolo Bonzini
On 23/12/2014 16:14, Boris Ostrovsky wrote: +do { +version = pvti-version; + +/* This is also a read barrier, so we'll read version first. */ +rdtsc_barrier(); +tsc = __native_read_tsc(); This will cause VMEXIT on Xen with TSC_MODE_ALWAYS_EMULATE

Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2014-12-23 Thread Boris Ostrovsky
On 12/23/2014 10:14 AM, Paolo Bonzini wrote: On 23/12/2014 16:14, Boris Ostrovsky wrote: +do { +version = pvti-version; + +/* This is also a read barrier, so we'll read version first. */ +rdtsc_barrier(); +tsc = __native_read_tsc(); This will cause VMEXIT

[RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2014-12-22 Thread Andy Lutomirski
The pvclock vdso code was too abstracted to understand easily and excessively paranoid. Simplify it for a huge speedup. This opens the door for additional simplifications, as the vdso no longer accesses the pvti for any vcpu other than vcpu 0. Before, vclock_gettime using kvm-clock took about