Re: [RFC] kern/kern_timeout.c rewrite in progress

2015-01-04 Thread Hans Petter Selasky

Hi,

Please find attached an updated timeout patch which also updates clients 
in the kernel area to use the callout API properly, like cv_timedwait(). 
Previously there was some custom sleepqueue code in the callout 
subsystem. All of that has now been removed and we allow callouts to be 
protected by spinlocks. This allows us to tear down the callback like 
done with regular mutexes, and a td_slpmutex has been added to struct 
thread to atomically teardown the td_slpcallout. Further the 
TDF_TIMOFAIL and SWT_SLEEPQTIMO states can now be completely removed.


Summary of changes:

1) Make consistent callout API which also supports spinlocks for the 
callback function. This has been done to allow atomic callout stop of 
td_slpcallout without the need of many kernel threading quirks.


2) It is not allowed to migrate CPU if the timeout is restarted while 
the timeout callback is executing. Callouts must be stopped before CPU 
migration is allowed. Optionally drained.


3) Shared lock support has been removed, because it prevents atomic stop 
of the callback function.


4) A new API to drain callouts asynchronously has been added, called 
callout_drain_async().


Please test and report any errors!

Patch applies to FreeBSD-11-current as of today.

Thank you!

--HPS

Index: sys/ofed/include/linux/completion.h
===
--- sys/ofed/include/linux/completion.h	(revision 276531)
+++ sys/ofed/include/linux/completion.h	(working copy)
@@ -105,7 +105,9 @@
 		if (c-done)
 			break;
 		sleepq_add(c, NULL, completion, flags, 0);
+		sleepq_release(c);
 		sleepq_set_timeout(c, end - ticks);
+		sleepq_lock(c);
 		if (flags  SLEEPQ_INTERRUPTIBLE) {
 			if (sleepq_timedwait_sig(c, 0) != 0)
 return (-ERESTARTSYS);
Index: sys/kern/init_main.c
===
--- sys/kern/init_main.c	(revision 276531)
+++ sys/kern/init_main.c	(working copy)
@@ -504,7 +504,8 @@
 
 	callout_init_mtx(p-p_itcallout, p-p_mtx, 0);
 	callout_init_mtx(p-p_limco, p-p_mtx, 0);
-	callout_init(td-td_slpcallout, CALLOUT_MPSAFE);
+	mtx_init(td-td_slpmutex, td_slpmutex, NULL, MTX_SPIN);
+	callout_init_mtx(td-td_slpcallout, td-td_slpmutex, 0);
 
 	/* Create credentials. */
 	p-p_ucred = crget();
Index: sys/kern/kern_condvar.c
===
--- sys/kern/kern_condvar.c	(revision 276531)
+++ sys/kern/kern_condvar.c	(working copy)
@@ -313,15 +313,13 @@
 	DROP_GIANT();
 
 	sleepq_add(cvp, lock, cvp-cv_description, SLEEPQ_CONDVAR, 0);
+	sleepq_release(cvp);
 	sleepq_set_timeout_sbt(cvp, sbt, pr, flags);
 	if (lock != Giant.lock_object) {
-		if (class-lc_flags  LC_SLEEPABLE)
-			sleepq_release(cvp);
 		WITNESS_SAVE(lock, lock_witness);
 		lock_state = class-lc_unlock(lock);
-		if (class-lc_flags  LC_SLEEPABLE)
-			sleepq_lock(cvp);
 	}
+	sleepq_lock(cvp);
 	rval = sleepq_timedwait(cvp, 0);
 
 #ifdef KTRACE
@@ -383,15 +381,13 @@
 
 	sleepq_add(cvp, lock, cvp-cv_description, SLEEPQ_CONDVAR |
 	SLEEPQ_INTERRUPTIBLE, 0);
+	sleepq_release(cvp);
 	sleepq_set_timeout_sbt(cvp, sbt, pr, flags);
 	if (lock != Giant.lock_object) {
-		if (class-lc_flags  LC_SLEEPABLE)
-			sleepq_release(cvp);
 		WITNESS_SAVE(lock, lock_witness);
 		lock_state = class-lc_unlock(lock);
-		if (class-lc_flags  LC_SLEEPABLE)
-			sleepq_lock(cvp);
 	}
+	sleepq_lock(cvp);
 	rval = sleepq_timedwait_sig(cvp, 0);
 
 #ifdef KTRACE
Index: sys/kern/kern_lock.c
===
--- sys/kern/kern_lock.c	(revision 276531)
+++ sys/kern/kern_lock.c	(working copy)
@@ -210,9 +210,11 @@
 	GIANT_SAVE();
 	sleepq_add(lk-lock_object, NULL, wmesg, SLEEPQ_LK | (catch ?
 	SLEEPQ_INTERRUPTIBLE : 0), queue);
-	if ((flags  LK_TIMELOCK)  timo)
+	if ((flags  LK_TIMELOCK)  timo) {
+		sleepq_release(lk-lock_object);
 		sleepq_set_timeout(lk-lock_object, timo);
-
+		sleepq_lock(lk-lock_object);
+	}
 	/*
 	 * Decisional switch for real sleeping.
 	 */
Index: sys/kern/kern_switch.c
===
--- sys/kern/kern_switch.c	(revision 276531)
+++ sys/kern/kern_switch.c	(working copy)
@@ -93,8 +93,6 @@
 DPCPU_NAME(sched_switch_stats[SWT_TURNSTILE]), );
 SCHED_STAT_DEFINE_VAR(sleepq,
 DPCPU_NAME(sched_switch_stats[SWT_SLEEPQ]), );
-SCHED_STAT_DEFINE_VAR(sleepqtimo,
-DPCPU_NAME(sched_switch_stats[SWT_SLEEPQTIMO]), );
 SCHED_STAT_DEFINE_VAR(relinquish, 
 DPCPU_NAME(sched_switch_stats[SWT_RELINQUISH]), );
 SCHED_STAT_DEFINE_VAR(needresched,
Index: sys/kern/kern_synch.c
===
--- sys/kern/kern_synch.c	(revision 276531)
+++ sys/kern/kern_synch.c	(working copy)
@@ -236,13 +236,17 @@
 	 * return from cursig().
 	 */
 	sleepq_add(ident, lock, wmesg, sleepq_flags, 0);
-	if (sbt != 0)
-		sleepq_set_timeout_sbt(ident, sbt, pr, flags);
 	if (lock != NULL  class-lc_flags  LC_SLEEPABLE) {
 		

[CFT] Paravirtualized KVM clock

2015-01-04 Thread Bryan Venteicher
For the last few weeks, I've been working on adding support for KVM clock
in the projects/paravirt branch. Currently, a KVM VM guest will end up
selecting either the HPET or ACPI as the timecounter source. Unfortunately,
this is very costly since every timecounter fetch causes a VM exit. KVM
clock allows the guest to use the TSC instead; it is very similar to the
existing Xen timer.

The performance difference between HPET/ACPI and KVMCLOCK can be dramatic:
a simple disk benchmark goes from 10K IOPs to 100K IOPs.

The patch is attached is attached or available at [1]. I'd appreciate any
testing.

Also as a part of this, I've tried to generalized a bit of our existing
hypervisor guest code, with the eventual goal of being able to support more
invasive PV operations. The patch series is viewable in Phabricator.

https://reviews.freebsd.org/D1429 - paravirt: Generalize parts of the XEN
timer code into pvclock
https://reviews.freebsd.org/D1430 - paravirt: Add interface to calculate
the TSC frequency from pvclock
https://reviews.freebsd.org/D1431 - paravirt: Add simple hypervisor
registration and detection interface
https://reviews.freebsd.org/D1432 - paravirt: Add detection of bhyve using
new hypervisor interface
https://reviews.freebsd.org/D1433 - paravirt: Add detection of VMware using
new hypervisor interface
https://reviews.freebsd.org/D1434 - paravirt: Add detection of KVM using
new hypervisor interface
https://reviews.freebsd.org/D1435 - paravirt: Add KVM clock timecounter
support

My current plan is to MFC this series to 10-STABLE, and commit a
self-contained KVM clock to the other stable branches.

[1] - https://people.freebsd.org/~bryanv/patches/kvm_clock-1.patch
diff --git a/sys/amd64/include/pvclock.h b/sys/amd64/include/pvclock.h
new file mode 100644
index 000..f01fac6
--- /dev/null
+++ b/sys/amd64/include/pvclock.h
@@ -0,0 +1,6 @@
+/*-
+ * This file is in the public domain.
+ */
+/* $FreeBSD$ */
+
+#include x86/pvclock.h
diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64
index bbbe827..7d85742 100644
--- a/sys/conf/files.amd64
+++ b/sys/conf/files.amd64
@@ -555,13 +555,17 @@ x86/isa/nmi.c			standard
 x86/isa/orm.c			optional	isa
 x86/pci/pci_bus.c		optional	pci
 x86/pci/qpi.c			optional	pci
+x86/x86/bhyve.c			standard
 x86/x86/busdma_bounce.c		standard
 x86/x86/busdma_machdep.c	standard
 x86/x86/dump_machdep.c		standard
 x86/x86/fdt_machdep.c		optional	fdt
+x86/x86/hypervisor.c		standard
 x86/x86/identcpu.c		standard
 x86/x86/intr_machdep.c		standard
 x86/x86/io_apic.c		standard
+x86/x86/kvm.c			standard
+x86/x86/kvm_clock.c		standard
 x86/x86/legacy.c		standard
 x86/x86/local_apic.c		standard
 x86/x86/mca.c			standard
@@ -569,8 +573,10 @@ x86/x86/mptable.c		optional	mptable
 x86/x86/mptable_pci.c		optional	mptable pci
 x86/x86/msi.c			optional	pci
 x86/x86/nexus.c			standard
+x86/x86/pvclock.c		standard
 x86/x86/tsc.c			standard
 x86/x86/delay.c			standard
+x86/x86/vmware.c		standard
 x86/xen/hvm.c			optional	xenhvm
 x86/xen/xen_intr.c		optional	xen | xenhvm
 x86/xen/pv.c			optional	xenhvm
diff --git a/sys/conf/files.i386 b/sys/conf/files.i386
index 96879b8..ca83c4c 100644
--- a/sys/conf/files.i386
+++ b/sys/conf/files.i386
@@ -573,13 +573,17 @@ x86/isa/nmi.c			standard
 x86/isa/orm.c			optional isa
 x86/pci/pci_bus.c		optional pci
 x86/pci/qpi.c			optional pci
+x86/x86/bhyve.c			standard
 x86/x86/busdma_bounce.c		standard
 x86/x86/busdma_machdep.c	standard
 x86/x86/dump_machdep.c		standard
 x86/x86/fdt_machdep.c		optional fdt
+x86/x86/hypervisor.c		standard
 x86/x86/identcpu.c		standard
 x86/x86/intr_machdep.c		standard
 x86/x86/io_apic.c		optional apic
+x86/x86/kvm.c			standard
+x86/x86/kvm_clock.c		standard
 x86/x86/legacy.c		optional native
 x86/x86/local_apic.c		optional apic
 x86/x86/mca.c			standard
@@ -588,7 +592,9 @@ x86/x86/mptable_pci.c		optional apic native pci
 x86/x86/msi.c			optional apic pci
 x86/x86/nexus.c			standard
 x86/x86/tsc.c			standard
+x86/x86/pvclock.c		standard
 x86/x86/delay.c			standard
+x86/x86/vmware.c		standard
 x86/xen/hvm.c			optional xenhvm
 x86/xen/xen_intr.c		optional xen | xenhvm
 x86/xen/xen_apic.c		optional xenhvm
diff --git a/sys/dev/xen/timer/timer.c b/sys/dev/xen/timer/timer.c
index 5743076..53aff0a 100644
--- a/sys/dev/xen/timer/timer.c
+++ b/sys/dev/xen/timer/timer.c
@@ -59,6 +59,7 @@ __FBSDID($FreeBSD$);
 #include machine/clock.h
 #include machine/_inttypes.h
 #include machine/smp.h
+#include machine/pvclock.h
 
 #include dev/xen/timer/timer.h
 
@@ -95,9 +96,6 @@ struct xentimer_softc {
 	struct eventtimer et;
 };
 
-/* Last time; this guarantees a monotonically increasing clock. */
-volatile uint64_t xen_timer_last_time = 0;
-
 static void
 xentimer_identify(driver_t *driver, device_t parent)
 {
@@ -148,128 +146,20 @@ xentimer_probe(device_t dev)
 	return (BUS_PROBE_NOWILDCARD);
 }
 
-/*
- * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
- * yielding a 64-bit result.
- */
-static inline uint64_t

Re: [RFC] kern/kern_timeout.c rewrite in progress

2015-01-04 Thread Adrian Chadd
Hi!

Can you throw this into reviews.freebsd.org please? This is something
that should be very closely reviewed and tested.

(I'm going to go over this quite closely as it related to a lot of the
random crap I do ..)


-adrian


On 4 January 2015 at 04:15, Hans Petter Selasky h...@selasky.org wrote:
 Hi,

 Please find attached an updated timeout patch which also updates clients in
 the kernel area to use the callout API properly, like cv_timedwait().
 Previously there was some custom sleepqueue code in the callout subsystem.
 All of that has now been removed and we allow callouts to be protected by
 spinlocks. This allows us to tear down the callback like done with regular
 mutexes, and a td_slpmutex has been added to struct thread to atomically
 teardown the td_slpcallout. Further the TDF_TIMOFAIL and
 SWT_SLEEPQTIMO states can now be completely removed.

 Summary of changes:

 1) Make consistent callout API which also supports spinlocks for the
 callback function. This has been done to allow atomic callout stop of
 td_slpcallout without the need of many kernel threading quirks.

 2) It is not allowed to migrate CPU if the timeout is restarted while the
 timeout callback is executing. Callouts must be stopped before CPU migration
 is allowed. Optionally drained.

 3) Shared lock support has been removed, because it prevents atomic stop of
 the callback function.

 4) A new API to drain callouts asynchronously has been added, called
 callout_drain_async().

 Please test and report any errors!

 Patch applies to FreeBSD-11-current as of today.

 Thank you!

 --HPS


 ___
 freebsd-a...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-arch
 To unsubscribe, send any mail to freebsd-arch-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] Paravirtualized KVM clock

2015-01-04 Thread Adrian Chadd
... so, out of pure curiousity - what's making the benchmark go
faster? Is it userland side of things calling clock methods, or
something in the kernel, or both?



-adrian


On 4 January 2015 at 09:56, Bryan Venteicher
bry...@daemoninthecloset.org wrote:
 For the last few weeks, I've been working on adding support for KVM clock
 in the projects/paravirt branch. Currently, a KVM VM guest will end up
 selecting either the HPET or ACPI as the timecounter source. Unfortunately,
 this is very costly since every timecounter fetch causes a VM exit. KVM
 clock allows the guest to use the TSC instead; it is very similar to the
 existing Xen timer.

 The performance difference between HPET/ACPI and KVMCLOCK can be dramatic:
 a simple disk benchmark goes from 10K IOPs to 100K IOPs.

 The patch is attached is attached or available at [1]. I'd appreciate any
 testing.

 Also as a part of this, I've tried to generalized a bit of our existing
 hypervisor guest code, with the eventual goal of being able to support more
 invasive PV operations. The patch series is viewable in Phabricator.

 https://reviews.freebsd.org/D1429 - paravirt: Generalize parts of the XEN
 timer code into pvclock
 https://reviews.freebsd.org/D1430 - paravirt: Add interface to calculate
 the TSC frequency from pvclock
 https://reviews.freebsd.org/D1431 - paravirt: Add simple hypervisor
 registration and detection interface
 https://reviews.freebsd.org/D1432 - paravirt: Add detection of bhyve using
 new hypervisor interface
 https://reviews.freebsd.org/D1433 - paravirt: Add detection of VMware using
 new hypervisor interface
 https://reviews.freebsd.org/D1434 - paravirt: Add detection of KVM using
 new hypervisor interface
 https://reviews.freebsd.org/D1435 - paravirt: Add KVM clock timecounter
 support

 My current plan is to MFC this series to 10-STABLE, and commit a
 self-contained KVM clock to the other stable branches.

 [1] - https://people.freebsd.org/~bryanv/patches/kvm_clock-1.patch

 ___
 freebsd-a...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-arch
 To unsubscribe, send any mail to freebsd-arch-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] kern/kern_timeout.c rewrite in progress

2015-01-04 Thread Hans Petter Selasky

On 01/04/15 19:58, Adrian Chadd wrote:

Hi!

Can you throw this into reviews.freebsd.org please? This is something
that should be very closely reviewed and tested.

(I'm going to go over this quite closely as it related to a lot of the
random crap I do ..)



Hi Adrian,

Here you go:

https://reviews.freebsd.org/D1438

Thank you for your time to review this!

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


BOOTP_SETTLE_DELAY in sys/nfs/bootp_subr.c ?

2015-01-04 Thread Luigi Rizzo
[I realize this is code from 15 years ago so i am not sure if anyone
still knows or remembers the answer]

sys/nfs/bootp_subr.c is used to request via bootp or dhcp an address
and a boot path. The negotiation is done in a loop, and apparently
when replies are received on _all_ interfaces, the code extends the
loop by another 3 seconds (BOOTP_SETTLE_DELAY) with a logic that
is not documented and I do not follow.

Any idea ? 

I would understand not stopping at the first reply in case we want to
pick the 'best' one from multiple responses (which is implemented, 
to some degree, in bootpc_received() ). But if that is the case,
one should either 
1) use an unconditionally large timeout, or
2) take the first incoming packet (not necessarily valid) on _any_
interface as a signal that ok this interface is now on and apply 
the grace period from there.

Why do i care ? I am booting a diskless kernel with bhyve and
BOOTP_SETTLE_DELAY unnecessarily extends the boot time a lot,
and even worse delays happen if you have multiple interfaces
that do not respond due to some other unclear logic.

Depending on what is the original intention i would like
to implement either option #1 or #2 above.

Also, I would like to use environment variables to set/override
the in-kernel bootp settings

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: any primer on running bhyve guests sharing disk with host ?

2015-01-04 Thread Luigi Rizzo
On Sat, Jan 03, 2015 at 11:00:13AM -0800, Neel Natu wrote:
 Hi Luigi,
 
 On Sat, Jan 3, 2015 at 8:15 AM, Luigi Rizzo ri...@iet.unipi.it wrote:
  Hi,
  in order to do some kernel testing, I would like to run bhyve guests
  using (through NFS, probably) the host's file system.
  diskless(8) is probably one way to go, i was wondering if
  someone has instructions for that.
  Specifically:
  - how to bhyveload a kernel (rather than the full disk image);
as an alternative, given a kernel, something to build an image
that can be passed to bhyveload
 
 
 You can use the -h option to bhyveload(8) to do this.

thank you, i have it up and running now.
For the records this is what I am using:

sudo bhyveload -m 512 -h /tmp/diskless vm1

and in /tmp/diskless i have the following:
boot/
loader.rc:
set hint.uart.0.at=isa
set hint.uart.0.port=0x3F8
set hint.uart.0.flags=0x10
set vfs.root.mountfrom=nfs:192.168.1.126:/
boot /boot/kernel.diskless
kernel.diskless

The 'set' commands in loader.rc are enough to have the serial
console detected and the root path.
They could be given through -e options to bhyveload so in the end
you only need to put a suitable kernel into /some/place/boot/kernel/kernel
and call bhyveload -h /some/place -e hint.uart.0.at=isa ...

Current issues which I am investigating:
- for some reason the guest sends packets with invalid UDP checksums
  over vtnet0, which can be solved by removing (in if_vtnet.c)
  TXCSUM from if_capenable.

- when using NFS root there seems to be no way to avoid the dhcp phase,
  which is unfortunate because it adds unnecessary delays to the boot.
  This can be probably fixed easily because there are already kenv
  variables (boot.netif.name and friends) for the purpose.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] Paravirtualized KVM clock

2015-01-04 Thread Jim Harris
On Sun, Jan 4, 2015 at 12:00 PM, Adrian Chadd adr...@freebsd.org wrote:

 ... so, out of pure curiousity - what's making the benchmark go
 faster? Is it userland side of things calling clock methods, or
 something in the kernel, or both?


Most likely GEOM statistic gathering in the kernel but Bryan would have to
confirm.

I intermittently saw this same kind of massive slowdown in nvme(4)
performance a couple of years back due to a bug in the TSC self-check code
which has since been fixed.  The bug would result in falling back to HPET
and all of the clock calls from the GEOM code for each I/O would kill
performance.



 -adrian


 On 4 January 2015 at 09:56, Bryan Venteicher
 bry...@daemoninthecloset.org wrote:
  For the last few weeks, I've been working on adding support for KVM clock
  in the projects/paravirt branch. Currently, a KVM VM guest will end up
  selecting either the HPET or ACPI as the timecounter source.
 Unfortunately,
  this is very costly since every timecounter fetch causes a VM exit. KVM
  clock allows the guest to use the TSC instead; it is very similar to the
  existing Xen timer.
 
  The performance difference between HPET/ACPI and KVMCLOCK can be
 dramatic:
  a simple disk benchmark goes from 10K IOPs to 100K IOPs.
 
  The patch is attached is attached or available at [1]. I'd appreciate any
  testing.
 
  Also as a part of this, I've tried to generalized a bit of our existing
  hypervisor guest code, with the eventual goal of being able to support
 more
  invasive PV operations. The patch series is viewable in Phabricator.
 
  https://reviews.freebsd.org/D1429 - paravirt: Generalize parts of the
 XEN
  timer code into pvclock
  https://reviews.freebsd.org/D1430 - paravirt: Add interface to calculate
  the TSC frequency from pvclock
  https://reviews.freebsd.org/D1431 - paravirt: Add simple hypervisor
  registration and detection interface
  https://reviews.freebsd.org/D1432 - paravirt: Add detection of bhyve
 using
  new hypervisor interface
  https://reviews.freebsd.org/D1433 - paravirt: Add detection of VMware
 using
  new hypervisor interface
  https://reviews.freebsd.org/D1434 - paravirt: Add detection of KVM using
  new hypervisor interface
  https://reviews.freebsd.org/D1435 - paravirt: Add KVM clock timecounter
  support
 
  My current plan is to MFC this series to 10-STABLE, and commit a
  self-contained KVM clock to the other stable branches.
 
  [1] - https://people.freebsd.org/~bryanv/patches/kvm_clock-1.patch
 
  ___
  freebsd-a...@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-arch
  To unsubscribe, send any mail to freebsd-arch-unsubscr...@freebsd.org
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] Paravirtualized KVM clock

2015-01-04 Thread Bryan Venteicher
On Sun, Jan 4, 2015 at 8:01 PM, Jim Harris jim.har...@gmail.com wrote:



 On Sun, Jan 4, 2015 at 12:00 PM, Adrian Chadd adr...@freebsd.org wrote:

 ... so, out of pure curiousity - what's making the benchmark go
 faster? Is it userland side of things calling clock methods, or
 something in the kernel, or both?


 Most likely GEOM statistic gathering in the kernel but Bryan would have to
 confirm.


Yes
​ - t​
hat's the main
​ source​
. A similar issue exists in the network stack
​BPF.​


I haven't looked or thought too much if it make sense / is possible to use
kvmclock in userland too (I think kib@ added fast gettimeofday  friends
support a few years back).


I intermittently saw this same kind of massive slowdown in nvme(4)
 performance a couple of years back due to a bug in the TSC self-check code
 which has since been fixed.  The bug would result in falling back to HPET
 and all of the clock calls from the GEOM code for each I/O would kill
 performance.



 -adrian


 On 4 January 2015 at 09:56, Bryan Venteicher
 bry...@daemoninthecloset.org wrote:
  For the last few weeks, I've been working on adding support for KVM
 clock
  in the projects/paravirt branch. Currently, a KVM VM guest will end up
  selecting either the HPET or ACPI as the timecounter source.
 Unfortunately,
  this is very costly since every timecounter fetch causes a VM exit. KVM
  clock allows the guest to use the TSC instead; it is very similar to the
  existing Xen timer.
 
  The performance difference between HPET/ACPI and KVMCLOCK can be
 dramatic:
  a simple disk benchmark goes from 10K IOPs to 100K IOPs.
 
  The patch is attached is attached or available at [1]. I'd appreciate
 any
  testing.
 
  Also as a part of this, I've tried to generalized a bit of our existing
  hypervisor guest code, with the eventual goal of being able to support
 more
  invasive PV operations. The patch series is viewable in Phabricator.
 
  https://reviews.freebsd.org/D1429 - paravirt: Generalize parts of the
 XEN
  timer code into pvclock
  https://reviews.freebsd.org/D1430 - paravirt: Add interface to
 calculate
  the TSC frequency from pvclock
  https://reviews.freebsd.org/D1431 - paravirt: Add simple hypervisor
  registration and detection interface
  https://reviews.freebsd.org/D1432 - paravirt: Add detection of bhyve
 using
  new hypervisor interface
  https://reviews.freebsd.org/D1433 - paravirt: Add detection of VMware
 using
  new hypervisor interface
  https://reviews.freebsd.org/D1434 - paravirt: Add detection of KVM
 using
  new hypervisor interface
  https://reviews.freebsd.org/D1435 - paravirt: Add KVM clock timecounter
  support
 
  My current plan is to MFC this series to 10-STABLE, and commit a
  self-contained KVM clock to the other stable branches.
 
  [1] - https://people.freebsd.org/~bryanv/patches/kvm_clock-1.patch
 
  ___
  freebsd-a...@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-arch
  To unsubscribe, send any mail to freebsd-arch-unsubscr...@freebsd.org
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Jenkins build became unstable: FreeBSD_HEAD-tests2 #518

2015-01-04 Thread jenkins-admin
See https://jenkins.freebsd.org/job/FreeBSD_HEAD-tests2/518/

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org