Re: About virtio-scsi and\or scsi.

2015-07-30 Thread Bryan Venteicher
On Wed, Jul 29, 2015 at 4:53 PM, Eliezer Croitoru elie...@ngtech.co.il
wrote:

 I am testing couple VMs under kvm and from my tests it seems that there
 might not be support for hot-plug of virtio disks or virtio-scsi disks in
 freebsd?



​Hot plug of VirtIO block devices is not supported, but that is more
because of a lack PCI hot plug. ​Hot plugging of disks to an existing
VirtIO SCSI adapter is supported.



 I wanted to make sure I am understand right the situation FreeBSD is right
 now.

 If anyone knows please reply.

 Thanks,
 Eliezer
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: [CFT] Paravirtualized KVM clock

2015-01-21 Thread Bryan Venteicher
On Wed, Jan 21, 2015 at 3:15 PM, Peter Jeremy pe...@rulingia.com wrote:

 On 2015-Jan-04 11:56:14 -0600, Bryan Venteicher 
 bry...@daemoninthecloset.org wrote:
 For the last few weeks, I've been working on adding support for KVM clock
 in the projects/paravirt branch. Currently, a KVM VM guest will end up
 selecting either the HPET or ACPI as the timecounter source.
 Unfortunately,
 this is very costly since every timecounter fetch causes a VM exit. KVM
 clock allows the guest to use the TSC instead; it is very similar to the
 existing Xen timer.

 A somewhat late response but have you looked at

 https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bdedc65c3148f
 I've been running this[*] on a Google Compute Engine instance for about 6
 months without problems.


A goal of my work was to put a bit of infrastructure in place so FreeBSD
can support pvops across a variety of hypervisors. KVMCLOCK happens to be
about the easiest to implement, and has a decent performance win for many
situations.

I think that commit is broken on SMP guests: CPU_FOREACH() does not switch
the current CPU, so it just keeps writing to the MSR on the BSP.

[*] I had to patch out the test for KVM_FEATURE_CLOCKSOURCE_STABLE_BIT but
 I think that's a GCE issue.

 --
 Peter Jeremy

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: DigitalOcean offers VMs with FreeBSD!

2015-01-15 Thread Bryan Venteicher
On Thu, Jan 15, 2015 at 9:44 AM, Slawa Olhovchenkov s...@zxy.spb.ru wrote:

 On Thu, Jan 15, 2015 at 06:28:23PM +0300, Lev Serebryakov wrote:

  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA512
 
  On 15.01.2015 14:29, Lev Serebryakov wrote:
 
  
 https://www.digitalocean.com/company/blog/presenting-freebsd-how-we-made-it-happen/
  
I didn't see this news on mailing lists :)
   But here are some thread about FreeBSD is way slower than Linux in
  these virtual installations
 
  https://news.ycombinator.com/item?id=487

 May be IOPS quotation?
 Can you test with dd and custom kernel with MAXPHYS=1048576 ?



​What's the value of kern.timecounter.hardware? It will likely be either
HPET or ACPI which means there is an VM exit whenever the guest reads from
the emulated timecounter hardware. That's why I have some WIP to add
support for KVMCLOCK [1]. I hope to merge those changes to HEAD in a week
and STABLE shortly after.

In the meanwhile, not completely foolproof workaround is to use the TSC-low
timecounter source.

[1] -
https://lists.freebsd.org/pipermail/freebsd-arch/2015-January/016587.html



 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

[CFT] Paravirtualized KVM clock

2015-01-04 Thread Bryan Venteicher
(uint64_t delta, uint32_t mul_frac, int shift)
-{
-	uint64_t product;
-
-	if (shift  0)
-		delta = -shift;
-	else
-		delta = shift;
-
-#if defined(__i386__)
-	{
-		uint32_t tmp1, tmp2;
-
-		/**
-		 * For i386, the formula looks like:
-		 *
-		 *   lower = (mul_frac * (delta  UINT_MAX))  32
-		 *   upper = mul_frac * (delta  32)
-		 *   product = lower + upper
-		 */
-		__asm__ (
-			mul  %5   ; 
-			mov  %4,%%eax ; 
-			mov  %%edx,%4 ; 
-			mul  %5   ; 
-			xor  %5,%5; 
-			add  %4,%%eax ; 
-			adc  %5,%%edx ; 
-			: =A (product), =r (tmp1), =r (tmp2)
-			: a ((uint32_t)delta), 1 ((uint32_t)(delta  32)),
-			  2 (mul_frac) );
-	}
-#elif defined(__amd64__)
-	{
-		unsigned long tmp;
-
-		__asm__ (
-			mulq %[mul_frac] ; shrd $32, %[hi], %[lo]
-			: [lo]=a (product), [hi]=d (tmp)
-			: 0 (delta), [mul_frac]rm((uint64_t)mul_frac));
-	}
-#else
-#error xentimer: unsupported architecture
-#endif
-
-	return (product);
-}
-
-static uint64_t
-get_nsec_offset(struct vcpu_time_info *tinfo)
-{
-
-	return (scale_delta(rdtsc() - tinfo-tsc_timestamp,
-	tinfo-tsc_to_system_mul, tinfo-tsc_shift));
-}
-
-/*
- * Read the current hypervisor system uptime value from Xen.
- * See xen/interface/xen.h for a description of how this works.
- */
-static uint32_t
-xen_fetch_vcpu_tinfo(struct vcpu_time_info *dst, struct vcpu_time_info *src)
-{
-
-	do {
-		dst-version = src-version;
-		rmb();
-		dst-tsc_timestamp = src-tsc_timestamp;
-		dst-system_time = src-system_time;
-		dst-tsc_to_system_mul = src-tsc_to_system_mul;
-		dst-tsc_shift = src-tsc_shift;
-		rmb();
-	} while ((src-version  1) | (dst-version ^ src-version));
-
-	return (dst-version);
-}
-
 /**
  * \brief Get the current time, in nanoseconds, since the hypervisor booted.
  *
  * \param vcpu		vcpu_info structure to fetch the time from.
  *
- * \note This function returns the current CPU's idea of this value, unless
- *   it happens to be less than another CPU's previously determined value.
  */
 static uint64_t
 xen_fetch_vcpu_time(struct vcpu_info *vcpu)
 {
-	struct vcpu_time_info dst;
-	struct vcpu_time_info *src;
-	uint32_t pre_version;
-	uint64_t now;
-	volatile uint64_t last;
-
-	src = vcpu-time;
-
-	do {
-		pre_version = xen_fetch_vcpu_tinfo(dst, src);
-		barrier();
-		now = dst.system_time + get_nsec_offset(dst);
-		barrier();
-	} while (pre_version != src-version);
+	struct pvclock_vcpu_time_info *time;
 
-	/*
-	 * Enforce a monotonically increasing clock time across all
-	 * VCPUs.  If our time is too old, use the last time and return.
-	 * Otherwise, try to update the last time.
-	 */
-	do {
-		last = xen_timer_last_time;
-		if (last  now) {
-			now = last;
-			break;
-		}
-	} while (!atomic_cmpset_64(xen_timer_last_time, last, now));
+	time = (struct pvclock_vcpu_time_info *) vcpu-time;
 
-	return (now);
+	return (pvclock_get_timecount(time));
 }
 
 static uint32_t
@@ -302,15 +192,11 @@ static void
 xen_fetch_wallclock(struct timespec *ts)
 {
 	shared_info_t *src = HYPERVISOR_shared_info;
-	uint32_t version = 0;
+	struct pvclock_wall_clock *wc;
 
-	do {
-		version = src-wc_version;
-		rmb();
-		ts-tv_sec = src-wc_sec;
-		ts-tv_nsec = src-wc_nsec;
-		rmb();
-	} while ((src-wc_version  1) | (version ^ src-wc_version));
+	wc = (struct pvclock_wall_clock *) src-wc_version;
+
+	pvclock_get_wallclock(wc, ts);
 }
 
 static void
@@ -574,7 +460,7 @@ xentimer_resume(device_t dev)
 	}
 
 	/* Reset the last uptime value */
-	xen_timer_last_time = 0;
+	pvclock_resume();
 
 	/* Reset the RTC clock */
 	inittodr(time_second);
diff --git a/sys/i386/include/pvclock.h b/sys/i386/include/pvclock.h
new file mode 100644
index 000..f01fac6
--- /dev/null
+++ b/sys/i386/include/pvclock.h
@@ -0,0 +1,6 @@
+/*-
+ * This file is in the public domain.
+ */
+/* $FreeBSD$ */
+
+#include x86/pvclock.h
diff --git a/sys/kern/subr_param.c b/sys/kern/subr_param.c
index 95f3250..5332055 100644
--- a/sys/kern/subr_param.c
+++ b/sys/kern/subr_param.c
@@ -159,6 +159,8 @@ static const char *const vm_guest_sysctl_names[] = {
 	xen,
 	hv,
 	vmware,
+	bhyve,
+	kvm,
 	NULL
 };
 CTASSERT(nitems(vm_guest_sysctl_names) - 1 == VM_LAST);
diff --git a/sys/sys/systm.h b/sys/sys/systm.h
index d3833d0..50a49d2 100644
--- a/sys/sys/systm.h
+++ b/sys/sys/systm.h
@@ -73,7 +73,7 @@ extern int vm_guest;		/* Running as virtual machine guest? */
  * Keep in sync with vm_guest_sysctl_names[].
  */
 enum VM_GUEST { VM_GUEST_NO = 0, VM_GUEST_VM, VM_GUEST_XEN, VM_GUEST_HV,
-		VM_GUEST_VMWARE, VM_LAST };
+		VM_GUEST_VMWARE, VM_GUEST_BHYVE, VM_GUEST_KVM, VM_LAST };
 
 #if defined(WITNESS) || defined(INVARIANTS)
 void	kassert_panic(const char *fmt, ...)  __printflike(1, 2);
diff --git a/sys/x86/include/hypervisor.h b/sys/x86/include/hypervisor.h
new file mode 100644
index 000..d5d30eb
--- /dev/null
+++ b/sys/x86/include/hypervisor.h
@@ -0,0 +1,56 @@
+/*-
+ * Copyright (c) 2014 Bryan Venteicher bry...@freebsd.org
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without

Re: [CFT] Paravirtualized KVM clock

2015-01-04 Thread Bryan Venteicher
On Sun, Jan 4, 2015 at 8:01 PM, Jim Harris jim.har...@gmail.com wrote:



 On Sun, Jan 4, 2015 at 12:00 PM, Adrian Chadd adr...@freebsd.org wrote:

 ... so, out of pure curiousity - what's making the benchmark go
 faster? Is it userland side of things calling clock methods, or
 something in the kernel, or both?


 Most likely GEOM statistic gathering in the kernel but Bryan would have to
 confirm.


Yes
​ - t​
hat's the main
​ source​
. A similar issue exists in the network stack
​BPF.​


I haven't looked or thought too much if it make sense / is possible to use
kvmclock in userland too (I think kib@ added fast gettimeofday  friends
support a few years back).


I intermittently saw this same kind of massive slowdown in nvme(4)
 performance a couple of years back due to a bug in the TSC self-check code
 which has since been fixed.  The bug would result in falling back to HPET
 and all of the clock calls from the GEOM code for each I/O would kill
 performance.



 -adrian


 On 4 January 2015 at 09:56, Bryan Venteicher
 bry...@daemoninthecloset.org wrote:
  For the last few weeks, I've been working on adding support for KVM
 clock
  in the projects/paravirt branch. Currently, a KVM VM guest will end up
  selecting either the HPET or ACPI as the timecounter source.
 Unfortunately,
  this is very costly since every timecounter fetch causes a VM exit. KVM
  clock allows the guest to use the TSC instead; it is very similar to the
  existing Xen timer.
 
  The performance difference between HPET/ACPI and KVMCLOCK can be
 dramatic:
  a simple disk benchmark goes from 10K IOPs to 100K IOPs.
 
  The patch is attached is attached or available at [1]. I'd appreciate
 any
  testing.
 
  Also as a part of this, I've tried to generalized a bit of our existing
  hypervisor guest code, with the eventual goal of being able to support
 more
  invasive PV operations. The patch series is viewable in Phabricator.
 
  https://reviews.freebsd.org/D1429 - paravirt: Generalize parts of the
 XEN
  timer code into pvclock
  https://reviews.freebsd.org/D1430 - paravirt: Add interface to
 calculate
  the TSC frequency from pvclock
  https://reviews.freebsd.org/D1431 - paravirt: Add simple hypervisor
  registration and detection interface
  https://reviews.freebsd.org/D1432 - paravirt: Add detection of bhyve
 using
  new hypervisor interface
  https://reviews.freebsd.org/D1433 - paravirt: Add detection of VMware
 using
  new hypervisor interface
  https://reviews.freebsd.org/D1434 - paravirt: Add detection of KVM
 using
  new hypervisor interface
  https://reviews.freebsd.org/D1435 - paravirt: Add KVM clock timecounter
  support
 
  My current plan is to MFC this series to 10-STABLE, and commit a
  self-contained KVM clock to the other stable branches.
 
  [1] - https://people.freebsd.org/~bryanv/patches/kvm_clock-1.patch
 
  ___
  freebsd-a...@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-arch
  To unsubscribe, send any mail to freebsd-arch-unsubscr...@freebsd.org
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Bug in virtio-net

2014-12-08 Thread Bryan Venteicher
On Mon, Dec 8, 2014 at 5:34 PM, Shawn Webb latt...@gmail.com wrote:

 I was running Poudriere in bhyve. I got this kernel panic. I'm on a new
 11-CURRENT as of this morning. Would this be a NULL pointer deref?

 `uname -a`: FreeBSD  11.0-CURRENT FreeBSD 11.0-CURRENT #1
 b5310d8(hardened/current/master)-dirty: Mon Dec  8 12:58:12 UTC 2014
 shawn@pkg-build-01:/usr/obj/usr/src/sys/LATT-SEC  amd64

 This bhyve VM is at r275606. The host is at r275575.

 Thanks,

 Shawn

 Kern panic backtrace:

 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x0
 fault code  = supervisor read instruction, page not present
 instruction pointer = 0x20:0x0
 stack pointer   = 0x28:0xfe0469a0c830
 frame pointer   = 0x28:0xfe0469a0c8b0
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 12 (irq267: virtio_pci0)
 [ thread pid 12 tid 100040 ]
 Stopped at  0:KDB: reentering
 KDB: stack backtrace:
   db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
 0xfe0469a0bd90
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe0469a0be40
 kdb_reenter() at kdb_reenter+0x33/frame 0xfe0469a0be50
 trap() at trap+0x54/frame 0xfe0469a0c060
 calltrap() at calltrap+0x8/frame 0xfe0469a0c060
 --- trap 0xc, rip = 0x80e06033, rsp = 0xfe0469a0c120, rbp =
 0xfe0469a0c1c0 ---
 db_read_bytes() at db_read_bytes+0x53/frame 0xfe0469a0c1c0
 db_get_value() at db_get_value+0x38/frame 0xfe0469a0c210
 db_disasm() at db_disasm+0x23/frame 0xfe0469a0c330
 db_trap() at db_trap+0xc0/frame 0xfe0469a0c3c0
 kdb_trap() at kdb_trap+0x191/frame 0xfe0469a0c460
 trap_fatal() at trap_fatal+0x34c/frame 0xfe0469a0c4c0
 trap_pfault() at trap_pfault+0x33c/frame 0xfe0469a0c560
 trap() at trap+0x45e/frame 0xfe0469a0c770
 calltrap() at calltrap+0x8/frame 0xfe0469a0c770
 --- trap 0xc, rip = 0, rsp = 0xfe0469a0c830, rbp =
 0xfe0469a0c8b0 ---
 uart_sab82532_class() at 0/frame 0xfe0469a0c8b0
 ether_input() at ether_input+0x26/frame 0xfe0469a0c8d0
 vtnet_rxq_eof() at vtnet_rxq_eof+0x7be/frame 0xfe0469a0c9a0
 vtnet_rx_vq_intr() at vtnet_rx_vq_intr+0x94/frame 0xfe0469a0c9e0
 intr_event_execute_handlers() at intr_event_execute_handlers+0x1b8/frame
 0xfe0469a0ca20
 ithread_loop() at ithread_loop+0x96/frame 0xfe0469a0ca70
 fork_exit() at fork_exit+0x9a/frame 0xfe0469a0cab0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfe0469a0cab0
 --- trap 0, rip = 0, rsp = 0xfe0469a0cb70, rbp = 0 ---



​I doubt this has anything to do with vtnet. My guess is that
netisr_proto[NETISR_ETHER].np_handler(m) is ​NULL for some reason. Do you
have a dump?



 *** error reading from address 0 ***
 KDB: reentering
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
 0xfe0469a0c100
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe0469a0c1b0
 kdb_reenter() at kdb_reenter+0x33/frame 0xfe0469a0c1c0
 db_get_value() at db_get_value+0x52/frame 0xfe0469a0c210
 db_disasm() at db_disasm+0x23/frame 0xfe0469a0c330
 db_trap() at db_trap+0xc0/frame 0xfe0469a0c3c0
 kdb_trap() at kdb_trap+0x191/frame 0xfe0469a0c460
 trap_fatal() at trap_fatal+0x34c/frame 0xfe0469a0c4c0
 trap_pfault() at trap_pfault+0x33c/frame 0xfe0469a0c560
 trap() at trap+0x45e/frame 0xfe0469a0c770
 calltrap() at calltrap+0x8/frame 0xfe0469a0c770
 --- trap 0xc, rip = 0, rsp = 0xfe0469a0c830, rbp =
 0xfe0469a0c8b0 ---
 uart_sab82532_class() at 0/frame 0xfe0469a0c8b0
 ether_input() at ether_input+0x26/frame 0xfe0469a0c8d0
 vtnet_rxq_eof() at vtnet_rxq_eof+0x7be/frame 0xfe0469a0c9a0
 vtnet_rx_vq_intr() at vtnet_rx_vq_intr+0x94/frame 0xfe0469a0c9e0
 intr_event_execute_handlers() at intr_event_execute_handlers+0x1b8/frame
 0xfe0469a0ca20
 ithread_loop() at ithread_loop+0x96/frame 0xfe0469a0ca70
 fork_exit() at fork_exit+0x9a/frame 0xfe0469a0cab0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfe0469a0cab0
 --- trap 0, rip = 0, rsp = 0xfe0469a0cb70, rbp = 0 ---


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: dhclient sucks cpu usage...

2014-06-10 Thread Bryan Venteicher


- Original Message -
 On 10.06.2014 07:03, Bryan Venteicher wrote:
  Hi,
 
  - Original Message -
  So, after finding out that nc has a stupidly small buffer size (2k
  even though there is space for 16k), I was still not getting as good
  as performance using nc between machines, so I decided to generate some
  flame graphs to try to identify issues...  (Thanks to who included a
  full set of modules, including dtraceall on memstick!)
 
  So, the first one is:
  https://www.funkthat.com/~jmg/em.stack.svg
 
  As I was browsing around, the em_handle_que was consuming quite a bit
  of cpu usage for only doing ~50MB/sec over gige..  Running top -SH shows
  me that the taskqueue for em was consuming about 50% cpu...  Also pretty
  high for only 50MB/sec...  Looking closer, you'll see that bpf_mtap is
  consuming ~3.18% (under ether_nh_input)..  I know I'm not running tcpdump
  or anything, but I think dhclient uses bpf to be able to inject packets
  and listen in on them, so I kill off dhclient, and instantly, the
  taskqueue
  thread for em drops down to 40% CPU... (transfer rate only marginally
  improves, if it does)
 
  I decide to run another flame graph w/o dhclient running:
  https://www.funkthat.com/~jmg/em.stack.nodhclient.svg
 
  and now _rxeof drops from 17.22% to 11.94%, pretty significant...
 
  So, if you care about performance, don't run dhclient...
 
  Yes, I've noticed the same issue. It can absolutely kill performance
  in a VM guest. It is much more pronounced on only some of my systems,
  and I hadn't tracked it down yet. I wonder if this is fallout from
  the callout work, or if there was some bpf change.
 
  I've been using the kludgey workaround patch below.
 Hm, pretty interesting.
 dhclient should setup proper filter (and it looks like it does so:
 13:10 [0] m@ptichko s netstat -B
Pid  Netif   Flags  Recv  Drop Match Sblen Hblen Command
   1224em0 -ifs--l  41225922 011 0 0 dhclient
 )
 see match count.
 And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for
 each consumer on interface).
 It should not introduce significant performance penalties.
 


It will be a bit before I'm able to capture that. Here's a Flamegraph from
earlier in the year showing an absurd amount of time spent in bpf_mtap():

http://people.freebsd.org/~bryanv/vtnet/vtnet-bpf-10.svg


 
  diff --git a/sys/net/bpf.c b/sys/net/bpf.c
  index cb3ed27..9751986 100644
  --- a/sys/net/bpf.c
  +++ b/sys/net/bpf.c
  @@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct
  mbuf *m)
  return (BPF_TSTAMP_EXTERN);
  }
  }
  +#if 0
  if (quality == BPF_TSTAMP_NORMAL)
  binuptime(bt);
  else
  +#endif
 bpf_getttime() is called IFF packet filter matches some traffic.
 Can you show your netstat -B output ?
  getbinuptime(bt);

  return (quality);
 
 
  --
 John-Mark GurneyVoice: +1 415 225 5579
 
All that I will do, has been done, All that I have, has not.
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 
  ___
  freebsd-...@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-net
  To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 
 
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dhclient sucks cpu usage...

2014-06-09 Thread Bryan Venteicher
Hi,

- Original Message -
 So, after finding out that nc has a stupidly small buffer size (2k
 even though there is space for 16k), I was still not getting as good
 as performance using nc between machines, so I decided to generate some
 flame graphs to try to identify issues...  (Thanks to who included a
 full set of modules, including dtraceall on memstick!)
 
 So, the first one is:
 https://www.funkthat.com/~jmg/em.stack.svg
 
 As I was browsing around, the em_handle_que was consuming quite a bit
 of cpu usage for only doing ~50MB/sec over gige..  Running top -SH shows
 me that the taskqueue for em was consuming about 50% cpu...  Also pretty
 high for only 50MB/sec...  Looking closer, you'll see that bpf_mtap is
 consuming ~3.18% (under ether_nh_input)..  I know I'm not running tcpdump
 or anything, but I think dhclient uses bpf to be able to inject packets
 and listen in on them, so I kill off dhclient, and instantly, the taskqueue
 thread for em drops down to 40% CPU... (transfer rate only marginally
 improves, if it does)
 
 I decide to run another flame graph w/o dhclient running:
 https://www.funkthat.com/~jmg/em.stack.nodhclient.svg
 
 and now _rxeof drops from 17.22% to 11.94%, pretty significant...
 
 So, if you care about performance, don't run dhclient...
 

Yes, I've noticed the same issue. It can absolutely kill performance
in a VM guest. It is much more pronounced on only some of my systems,
and I hadn't tracked it down yet. I wonder if this is fallout from
the callout work, or if there was some bpf change.

I've been using the kludgey workaround patch below.

diff --git a/sys/net/bpf.c b/sys/net/bpf.c
index cb3ed27..9751986 100644
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct mbuf 
*m)
return (BPF_TSTAMP_EXTERN);
}
}
+#if 0
if (quality == BPF_TSTAMP_NORMAL)
binuptime(bt);
else
+#endif
getbinuptime(bt);
 
return (quality);


 --
   John-Mark GurneyVoice: +1 415 225 5579
 
  All that I will do, has been done, All that I have, has not.
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: BUG: some drivers return ENOBUFS when the mbuf is actually queued

2014-06-04 Thread Bryan Venteicher
On Wed, Jun 4, 2014 at 8:49 AM, Luigi Rizzo ri...@iet.unipi.it wrote:

 Hi,
 if I read correctly the code, there are a few network device drivers
 (igb, ixgbe, i40e, vtnet, vmxnet) where ifp-if_transmit(ifp, m)
 can return ENOBUFS even when 'm' has _not_ been dropped:

 e1000/if_igb.c :: igb_mq_start()
 can return ENOBUFS from igb_xmit()

 ixgbe/ixgbe_main.c :: ixgbe_mq_start_locked()
 can return ENOBUFS from ixgbe_xmit()
(similar for i40)

 virtio/network/if_vtnet.c :: vtnet_txq_mq_start
 can return ENOBUFS if virtqueue_full()

 In all these cases, the error comes from a later attempt to transfer
 mbufs from the buf_ring to the NIC ring.

 All drivers using if_transmit() seem correct, as well as a bunch
 of others (cxgbe, sfxge, mxge ...) that reassign if_transmit and I
 checked for correctness.

 I think that when the current buffer has been queued, returning
 ENOBUFS is extremely confusing and should not be done.

 I would also argue that the return from ifp-if_transmit(ifp, m)
 should only tell what happened to 'm', not other things
 such as the status of the queue.

 Any objections if i fix the above drivers ?


No objection for vtnet and vmxnet.


 cheers
 luigi

 (For those curious: i found this issue when using emulated
 netmap mode on top of a standard driver. The netmap emulation
 code assumes that ENOBUFS indicates that the driver has
 m_free()'d the mbuf, same as it happens on linux, and the
 bug was causing panics in my system).


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: device vtnet - device virtio_net?

2013-12-23 Thread Bryan Venteicher


- Original Message -
 Hi,
 
 GENERIC has
 
 # VirtIO support
 device  virtio  # Generic VirtIO bus (required)
 device  virtio_pci  # VirtIO PCI device
 device  vtnet   # VirtIO Ethernet device
 device  virtio_blk  # VirtIO Block device
 device  virtio_scsi # VirtIO SCSI device
 device  virtio_balloon  # VirtIO Memory Balloon device
 
 Maybe it's just my OCD kicking in, but why is vtnet not named virtio_net?
 That would be consistent with the other virtio device names.
 


That's what I picked 3 some years ago and it is too late to change it. I
believe my thinking at the time was to match most other Ethernet drives:
the module name is if_vtnet, so use vtnet in the kernel config.


 Cheers,
 Jos
 --
 Jos Backus
 jos at catnook.com
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: vtnet broken on -CURRENT when using VirtualBox

2013-12-07 Thread Bryan Venteicher


- Original Message -
 Hi list,
 I'm observing a 100%-reproducible panic in the following setup:
 
 Host system: FreeBSD 9.1-RELEASE-p7 amd64
 $ pkg info | grep virtualbox
 virtualbox-ose-4.2.18_1A general-purpose full virtualizer for
 x86 hardware
 virtualbox-ose-kmod-4.2.18 VirtualBox kernel module for FreeBSD
 
 System in a virtual machine: FreeBSD-CURRENT SVN rev 259064.
 Virtual machine is created with virtio host-only adapter.
 
 When trying to ssh into VM, the system in VM panics with the following
 message:
 
 panic: vtnet_txq_offload: mbuf 0xc309e900 TSO without checksum offload
 KDB: stack backtrace:
 db_trace_self_wrapper(c0b4fd4d,a6461,65393030,6039,c13a29c0,...) at
 db_trace_self_wrapper+0x2d/frame 0xc23f85a0
 kdb_backtrace(c0b4b145,c0c29a7c,c0b9b43d,c23f865c,c23f865c,...) at
 kdb_backtrace+0x30/frame 0xc23f8608
 vpanic(c0c29918,100,c0b9b43d,c23f865c,c23f865c,...) at vpanic+0x80/frame
 0xc23f862c
 kassert_panic(c0b9b43d,c0b9b466,c309e900,8b1,c0dad504,...) at
 kassert_panic+0xe9/frame 0xc23f8650
 vtnet_txq_mq_start_locked(c2e02810,0,c0b9b369,8ea,c2e02810,...) at
 vtnet_txq_mq_start_locked+0x62b/frame 0xc23f8808
 vtnet_txq_mq_start(c2cf7800,c309e900,6,c23f89e0,c23f8866,...) at
 vtnet_txq_mq_start+0x76/frame 0xc23f8834
 ether_output(c2cf7800,c309e900,c23f89e0,c23f89d0,c36639d8,...) at
 ether_output+0x64b/frame 0xc23f
 ip_output(c309e900,0,c23f89d0,0,0,...) at ip_output+0x173f/frame 0xc23f8938
 tcp_output(c36665e0,c342f400,32c,1,c36639d8,...) at
 tcp_output+0x1cbf/frame 0xc23f8a9c
 tcp_usr_send(c3410d40,0,c342f400,0,0,...) at tcp_usr_send+0x346/frame
 0xc23f8ad0
 sosend_generic(c3410d40,0,c23f8c10,0,0,...) at
 sosend_generic+0x3b3/frame 0xc23f8b40
 soo_write(c3142f50,c23f8c10,c2cf0d00,0,c3108620,...) at
 soo_write+0x5d/frame 0xc23f8b70
 dofilewrite(c3142f50,c23f8c10,,,0,...) at
 dofilewrite+0x86/frame 0xc23f8ba8
 kern_writev(c3108620,3,c23f8c10,0,28c4d608,...) at
 kern_writev+0x96/frame 0xc23f8bf0
 sys_write(c3108620,c23f8cc8,c23f8c98,c076b3a4,c0c36e90,...) at
 sys_write+0x5c/frame 0xc23f8c40
 syscall(c23f8d08) at syscall+0x2de/frame 0xc23f8cfc
 Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xc23f8cfc
 --- syscall (4, FreeBSD ELF32, sys_write), eip = 0x2840dd77, esp =
 0xbfbfb328, ebp = 0xbfbfb348 ---
 KDB: enter: panic
 [ thread pid 1570 tid 100065 ]
 Stopped at  kdb_enter+0x3d: movl$0,kdb_why
 db
 
 
 Please help me to debug this.
 

I suspect I know what is wrong. What's the output of `ifconfig vtnetX`?

 --
 Regards,
 Ilya Bakulin
 
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


amd64 minidump slowness

2013-10-15 Thread Bryan Venteicher
Hi,

At $JOB, we have machines with 400GB RAM that even the smallest
15GB amd64 minidump takes well over an hour. The major cause of
the slowness is that in minidumpsys(), blk_write() is called
PAGE_SIZE at a time. This causes blk_write() to poll the console
for the Ctrl-C abort once per page.

The attached patch changes blk_write() to be called with a run of
physically contiguous pages. This reduced the dump time by over a
magnitude. Of course, blk_write() could also be changed to poll
the console less frequently (like only on every IO).

If anybody else dumps on machines with lots of RAM, it would be
nice to know the difference this patch makes. I've got a second
set of patches that further reduces the dump time by over half that
I'll try to clean up soon.

http://people.freebsd.org/~bryanv/patches/minidump.patchcommit 25f9e82e4ac93e71c6cf06fe2faa1899967db725
Author: Bryan Venteicher bryanventeic...@gmail.com
Date:   Sun Sep 29 13:56:42 2013 -0500

Call blk_write() with a run of physically contiguous pages

Previously, blk_write() was being called one page at a time, which
would cause it to poll the console for every page. This change makes
dumping a magnitude faster, and is especially useful on large memory
machines.

diff --git a/sys/amd64/amd64/minidump_machdep.c b/sys/amd64/amd64/minidump_machdep.c
index f14c539..26b2b31 100644
--- a/sys/amd64/amd64/minidump_machdep.c
+++ b/sys/amd64/amd64/minidump_machdep.c
@@ -221,7 +221,8 @@ minidumpsys(struct dumperinfo *di)
 	vm_offset_t va;
 	int error;
 	uint64_t bits;
-	uint64_t *pml4, *pdp, *pd, *pt, pa;
+	uint64_t *pml4, *pdp, *pd, *pt, start_pa, pa;
+	size_t sz;
 	int i, ii, j, k, n, bit;
 	int retry_count;
 	struct minidumphdr mdhdr;
@@ -412,18 +413,29 @@ minidumpsys(struct dumperinfo *di)
 	}
 
 	/* Dump memory chunks */
-	/* XXX cluster it up and use blk_dump() */
-	for (i = 0; i  vm_page_dump_size / sizeof(*vm_page_dump); i++) {
+	for (i = 0, start_pa = 0, sz = 0;
+	 i  vm_page_dump_size / sizeof(*vm_page_dump); i++) {
 		bits = vm_page_dump[i];
 		while (bits) {
 			bit = bsfq(bits);
 			pa = (((uint64_t)i * sizeof(*vm_page_dump) * NBBY) + bit) * PAGE_SIZE;
-			error = blk_write(di, 0, pa, PAGE_SIZE);
-			if (error)
-goto fail;
+			if (sz == 0 || start_pa + sz == pa) {
+if (sz == 0)
+	start_pa = pa;
+sz += PAGE_SIZE;
+			} else {
+error = blk_write(di, 0, start_pa, sz);
+if (error)
+	goto fail;
+start_pa = pa;
+sz = PAGE_SIZE;
+			}
 			bits = ~(1ul  bit);
 		}
 	}
+	error = blk_write(di, 0, start_pa, sz);
+	if (error)
+		goto fail;
 
 	error = blk_flush(di);
 	if (error)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: [CFT] VMware vmxnet3 ethernet driver

2013-09-02 Thread Bryan Venteicher


- Original Message -
 
 
 - Original Message -
  Bezüglich Bryan Venteicher's Nachricht vom 27.08.2013 06:18 (localtime):
  
  ...
  
snip
 The intr usage is higher than the other drivers you compared against
 because if_vmx does the off-level processing in ithreads where as the
 others do it in a taskqueue.
 
 BTW: if_vmx can to LRO as well. I don't think the emulated e1000 can,
 but I bet the e1000e does.
 
  if_vmx - if_vmx
  1.32 GBits/sec, load: 10-45%Sys 40-48%Intr
  
  if_vmxJumbo - if_vmxJumbo
  5.01 GBits/sec, load: 10-45%Sys 40-48%Intr
  
  Please find attached the different outputs of dev.vmx.X (the mtu9000 run
  was
  only 3.47GBits/sec in that case, took the numbers anyway)
  

Thanks for the sysctl output. 

dev.vmx.0.txq0.ringfull: 133479
dev.vmx.0.txq0.hstats.tso_packets: 564986
dev.vmx.0.txq0.hstats.ucast_packets: 570604

For the number of packets transmitted, there's a really high
percentage of time we find the Tx queue full enough it is not
able to hold the next to transmit frame. I've haven't been
able to recreate this. But I recently made a commit [1] that
might help alleviate this.

[1] http://svnweb.freebsd.org/base?view=revisionrevision=255055

  wbr,
  
  -Harry
  
  
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: [CFT] VMware vmxnet3 ethernet driver

2013-08-27 Thread Bryan Venteicher


- Original Message -
 Bezüglich Bryan Venteicher's Nachricht vom 27.08.2013 06:18 (localtime):
 
 ...
 
  It seems if_vmx doesn't support jumbo frames. If I set mtu 9000, I get
  »vmx0: cannot populate Rx queue 0«, I have no problems using jumbo
  frames with vmxnet3.
 
  This could fail for two reasons - could not allocate an mbuf cluster,
  or the call to bus_dmamap_load_mbuf_sg() failed. For the former, you
  should check vmstat -z. For the later, the behavior of
  bus_dmamap_load_mbuf_sg()
  changed between 9.1 and 9.2, and I know it was broken for awhile. I don't
  recall exactly when I fixed it (I think shortly after I made the original
  announcement). Could you retry with the files from HEAD @ [1]? Also, there
  are new sysctl oids (dev.vmx.X.mbuf_load_failed  dev.vmx.X.mgetcl_failed)
  for these errors.
 
  I just compiled the driver on 9.2-RC2 with the sources from HEAD and was
  able to change the MTU to 9000.
 
  [1]- http://svnweb.freebsd.org/base/head/sys/dev/vmware/vmxnet3/
 
 Thanks a lot for your ongoing work!
 I can confirm that with recent if_vmx.c from head and compiled for
 9.2-RC3, setting mtu to 9000 works as expected :-)
 
 
  I took a oldish host (4x2,8GHz Core2[LGA775]) with recent software: ESXi
  5.1U1 and FreeBSD-9.2-RC2
  Two guests are connected to one MTU9000 VMware Software Switch.
 
  I've got a few performance things to still look at. What's the sysctl
  dev.vmx.X output for the if_vmx-if_vmx tests?
 
 Just repeated if_vmx simple iperf bench, results vary slightly from
 standard 10sec run to run, but still noticable high Intr usage:


The intr usage is higher than the other drivers you compared against
because if_vmx does the off-level processing in ithreads where as the
others do it in a taskqueue.

BTW: if_vmx can to LRO as well. I don't think the emulated e1000 can,
but I bet the e1000e does.

 if_vmx - if_vmx
 1.32 GBits/sec, load: 10-45%Sys 40-48%Intr
 
 if_vmxJumbo - if_vmxJumbo
 5.01 GBits/sec, load: 10-45%Sys 40-48%Intr
 
 Please find attached the different outputs of dev.vmx.X (the mtu9000 run was
 only 3.47GBits/sec in that case, took the numbers anyway)
 
 wbr,
 
 -Harry
 
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: [CFT] VMware vmxnet3 ethernet driver

2013-08-26 Thread Bryan Venteicher


- Original Message -
 Bezüglich Bryan Venteicher's Nachricht vom 05.08.2013 02:12 (localtime):
  Hi,
 
  I've ported the OpenBSD vmxnet3 ethernet driver to FreeBSD. I did a
  lot of cleanup, bug fixes, new features, etc (+2000 new lines) along
  the way so there is not much of a resemblance left.
 
  The driver is in good enough shape I'd like additional testers. A patch
  against -CURRENT is at [1]. Alternatively, the driver and a Makefile is
  at [2]; this should compile at least as far back as 9.1. I can look at
  8-STABLE if there is interest.
 
  Obviously, besides reports of 'it works', I'm interested performance vs
  the emulated e1000, and (for those using it) the VMware tools vmxnet3
  driver. Hopefully it is no worse :)
 
 Hello Bryan,
 
 thanks a lot for your hard work!
 
 It seems if_vmx doesn't support jumbo frames. If I set mtu 9000, I get
 »vmx0: cannot populate Rx queue 0«, I have no problems using jumbo
 frames with vmxnet3.
 

This could fail for two reasons - could not allocate an mbuf cluster,
or the call to bus_dmamap_load_mbuf_sg() failed. For the former, you
should check vmstat -z. For the later, the behavior of bus_dmamap_load_mbuf_sg()
changed between 9.1 and 9.2, and I know it was broken for awhile. I don't
recall exactly when I fixed it (I think shortly after I made the original
announcement). Could you retry with the files from HEAD @ [1]? Also, there
are new sysctl oids (dev.vmx.X.mbuf_load_failed  dev.vmx.X.mgetcl_failed)
for these errors.

I just compiled the driver on 9.2-RC2 with the sources from HEAD and was
able to change the MTU to 9000.

[1]- http://svnweb.freebsd.org/base/head/sys/dev/vmware/vmxnet3/

 I took a oldish host (4x2,8GHz Core2[LGA775]) with recent software: ESXi
 5.1U1 and FreeBSD-9.2-RC2
 Two guests are connected to one MTU9000 VMware Software Switch.
 

I've got a few performance things to still look at. What's the sysctl 
dev.vmx.X output for the if_vmx-if_vmx tests?


 Simple iperf (standard TCP) results:
 
 vmxnet3jumbo - vmxnet3jumbo
 5.3Gbits/sec, load: 40-60%Sys 0.5-2%Intr
 
 vmxnet3 - vmxnet3
 1.85 GBits/sec, load: 60-80%Sys 0-0.8%Intr
 
 
 if_vmx - if_vmx
 1.51 GBits/sec, load: 10-45%Sys 40-48%Intr
 !!!
 if_vmxjumbo - if_vmxjumbo not possible
 
 
 if_em(e1000) - if_em(e1000)
 1.23 GBits/sec, load: 80-60%Sys 0.5-8%Intr
 
 if_em(e1000)jumbo - if_em(e1000)jumbo
 2.27Gbits/sec, load: 40-30%Sys 0.5-5%Intr
 
 
 if_igb(e1000e)junmbo - if_igb(e1000e)jumbo
 5.03 Gbits/s, load: 70-60%Sys 0.5%Intr
 
 if_igb(e1000e) - if_igb(e1000e)
 1.39 Gbits/s, load: 60-80%Sys 0.5%Intr
 
 
 f_igb(e1000e) - if_igb(e1000e), both hw.em.[rt]xd=4096
 1.66 Gbits/s, load: 65-90%Sys 0.5%Intr
 
 if_igb(e1000e)junmbo - if_igb(e1000e)jumbo, both hw.em.[rt]xd=4096
 4.81 Gbits/s, load: 65%Sys 0.5%Intr
 
 Conclusion:
 if_vmx performs well compared to the regular emulated nics and standard
 MTU, but it's behind tuned e1000e nic emulation and can't reach vmxnet3
 performance with regular mtu. If one needs throughput, the missing jumbo
 frame support in if_vmx  is a show stopper.
 
 e1000e is preferable over e1000, even if not officially choosable with
 FreeBSD-selection as guest (edit .vmx and alter ethernet0.virtualDev =
 e1000e, and dont forget to set hw.em.enable_msix=0 in loader.conf,
 although the driver e1000e attaches is if_igb!)
 
 Thanks,
 
 -Harry
 

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: [CFT] VMware vmxnet3 ethernet driver

2013-08-07 Thread Bryan Venteicher


- Original Message -
 it'd be nice if we could get vmware to just support the drivers in tree..
 by which I mean, just submit patches.. why do they need to have it out
 of tree?
 

I agree. But they are all unfriendly licensed. The FF had a discussion
to get them relicensed to something more suitable, but that went no where
over the past year.

It is unfortunate this vendor supplied, out of tree driver, issue is
still around. Linux should have taught companies how foolish this is.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] VMware vmxnet3 ethernet driver

2013-08-06 Thread Bryan Venteicher


- Original Message -
 Perhaps not, but they do support FreeBSD. I've started several support cases
 with FreeBSD-specific problems and they've fixed all so far.
 

Yes, it is not a blackhole of support. At $JOB, we got caught by the FreeBSD
specific issue of the busted timer that was fixed. But they've less helpful
in other regards, and have more or less said FreeBSD isn't high in their
priority because it isn't Linux.

 Are you aiming at completely replacing VMware tools, or just the device
 drivers?
 

I'd like as much as possible to work out of the box. vmxnet3 is as far as
my current interests go. OpenBSD has a vmt device that apparently does (at
least the important bits of) what vmtoolsd does; I'll look at that closer
at some point.

I have no intention of preventing people from using VMware's tools if
they desire, nor breaking existing users.

 --
 Joel
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Bryan Venteicher


- Original Message -
 i am slightly unclear of what mechanisms we use to prevent races
 between interface being reconfigured (up/down/multicast setting, etc,
 all causing reinitialization of the rx and tx rings) and
 
 i) packets from the host stack being sent out;
 ii) interrupts from the network card being processed.
 
 I think in the old times IFF_DRV_RUNNING was used for this purpose,
 but now it is not enough.
 Acquiring the core lock in the NIC does not seem enough, either,
 because newer drivers, especially multiqueue ones, have per-queue
 rx and tx locks.
 

What I've done in my drivers is:
  * Lock the core mutex
  * Clear IFF_DRV_RUNNING
  * Lock/unlock each queue's lock

The various Rx/Tx queue functions check for IFF_DRV_RUNNING after
(re)acquiring their queue lock. See at vtnet_stop_rendezvous() at
[1] for an example.

 Does anyone know if there is a generic mechanism, or each driver
 reimplements its own way ?
 

We desperately need a saner ifnet/driver interface. I think andre@ 
had some previous work in this area (and additional plans as well?).
IMO, there's a lot to like on what DragonflyBSD has done in this area.

[1] - 
http://svnweb.freebsd.org/base/user/bryanv/vtnetmq/sys/dev/virtio/network/if_vtnet.c?revision=252451view=markup

 thanks
 luigi
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [net] protecting interfaces from races between control and data ?

2013-08-05 Thread Bryan Venteicher


- Original Message -
 On Mon, Aug 5, 2013 at 8:19 PM, Adrian Chadd adr...@freebsd.org wrote:
 
  No, brian said two things:
 
  * the flag, protected by the core lock
  * per-queue flags
 
 
 i see no mentions on per-queue flags on his email.
 This is the relevant part
 

Right, I just use the IFF_DRV_RUNNING flag. I think Adrian meant
'per-queue locks' here? 

 
 
 What I've done in my drivers is:
   * Lock the core mutex
   * Clear IFF_DRV_RUNNING
   * Lock/unlock each queue's lock
 
 The various Rx/Tx queue functions check for IFF_DRV_RUNNING after
 (re)acquiring their queue lock. See at vtnet_stop_rendezvous() at
 [1] for an example.
 
 [1]
 http://svnweb.freebsd.org/base/user/bryanv/vtnetmq/sys/dev/virtio/network/if_vtnet.c?revision=252451view=markup
 
 -
 
 
 
 
 
  -adrian
 
 
 
 
 --
 -+---
  Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
  http://www.iet.unipi.it/~luigi/. Universita` di Pisa
  TEL  +39-050-2211611   . via Diotisalvi 2
  Mobile   +39-338-6809875   . 56122 PISA (Italy)
 -+---
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFT] VMware vmxnet3 ethernet driver

2013-08-05 Thread Bryan Venteicher


- Original Message -
 I have ~100 FreeBSD 8/9 VMs in my vSphere 5.1 environment, all using the
 VMware tools package from VMware. Everything has been running great for
 years.
 (we skipped vSphere 5.0). Why should I use this vmxnet driver instead of the
 VMware tools driver or the emulated e1000?
 

They are out of tree and subject to rotting. I had to use the patches
at [1] to even get them to compile on 9.1 and -current. I don't think
VMware puts much engineering resources behind it; there was a compiler
warning of a silly bug like:
if (foo) ;
do_something();

vmxnet3 has modern features LRO, IPv6 checksum offloading, etc that
the emulated e1000 lacks. In my test setup, e1000 tops out at 30MB/sec
but vmxnet3 goes to 50MB/sec. I'd like to hear other's experiences.

[1] - http://ogris.de/vmware/

 --
 Joel
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[CFT] VMware vmxnet3 ethernet driver

2013-08-04 Thread Bryan Venteicher
Hi,

I've ported the OpenBSD vmxnet3 ethernet driver to FreeBSD. I did a
lot of cleanup, bug fixes, new features, etc (+2000 new lines) along
the way so there is not much of a resemblance left.

The driver is in good enough shape I'd like additional testers. A patch
against -CURRENT is at [1]. Alternatively, the driver and a Makefile is
at [2]; this should compile at least as far back as 9.1. I can look at
8-STABLE if there is interest.

Obviously, besides reports of 'it works', I'm interested performance vs
the emulated e1000, and (for those using it) the VMware tools vmxnet3
driver. Hopefully it is no worse :)

The drivers supports most VMXNET3 features - IPv4/IPv6 checksum offload,
TSO, LRO, VLAN tag offload. AFAIK, the only notable missing feature is
multiqueue; 3/4 of the code needed is already in the driver, but I don't
have time to do final bit of work.

Most of the development was done on QEMU 1.5, but also tested on VMware
Fusion and VMware ESXi.

[1] - http://people.freebsd.org/~bryanv/vmware/if_vmx.patch
[2] - http://people.freebsd.org/~bryanv/vmware/files/
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Problem with curret in vmware

2013-07-31 Thread Bryan Venteicher
 On Tuesday, July 30, 2013 5:25:06 am Alexander Yerenkow wrote:
  Hello all.
  I have panics in vmware with installed vmwaretools (they are guessed
  culprit).
  Seems that memory balooning (or using more memory in all vms than there is
  in host)
  produces some kind of weird behavior in FreeBSD.
  This vm aren't shutted down now, is there somethin I can do to help
  investigate this?
  
  Panic screens:
  http://gits.kiev.ua/FreeBSD/panic1.png
  http://gits.kiev.ua/FreeBSD/panic2.png
 
 Looks like their code needs to be updated to work with locking changes in
 HEAD.  Attilio is probably the best person to ask.
 

This highlights why we should move away from the poorly supported, out of
tree, unfriendly licensed VMware tools. I have a port of the vmxnet3 from
OpenBSD [1] that I intend to commit in time for 10. Next, I hope to look
at the OpenBSD vmt [2] VMware tools driver.

The balloon is a bit trickier. AFAIK, OpenBSD doesn't have a driver for
easy porting. The VMware tools driver for FreeBSD is GPL licensed, and
VMware has shown no interest/ability to relicense their tools. Likely,
the best way forward is to port their CDDL licensed Solaris driver.

[1] - http://svnweb.freebsd.org/base/projects/vmxnet/sys/dev/vmware/vmxnet3/
[2] - 
http://www.openbsd.org/cgi-bin/man.cgi?query=vmtapropos=0sektion=0manpath=OpenBSD+Currentarch=i386format=html

 --
 John Baldwin
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: VirtIO in GENERIC

2013-01-07 Thread Bryan Venteicher
On Mon, Dec 17, 2012 at 1:17 AM, Bryan Venteicher bry...@freebsd.org wrote:
 On Sun, Dec 16, 2012 at 11:06 PM, Jim Harris jimhar...@freebsd.org wrote:


 On Sun, Dec 16, 2012 at 6:53 PM, Andrew Thompson thom...@freebsd.org
 wrote:

 On 17 December 2012 13:17, Bryan Venteicher bry...@freebsd.org wrote:

  There's been lots of requests to have VirtIO in GENERIC for i386 and
  amd64. Anybody have any issues or concerns with this or the patch at
  [1]. This also removes the kludge that was introduced in r239009.
 
  I've compiled LINT for i386 and amd64 so hopefully there won't be any
  surprise breakages.
 
  [1] http://people.freebsd.org/~bryanv/patches/virtio.generic.patch


 It would be great to have the drivers enabled. You do not need the
 sys/conf/files changes, the common and arch files are combined.


 Removing the virtio files from sys/conf/files ensures these drivers can only
 be specified in x86 kernel configuration files.  r239009 added these lines
 to sys/conf/files, but Bryan's patch does it more correctly.

 The only question I have is the GENERIC changes where device virtio is
 added - it says it is required, but should this instead say it's required
 for any of the other drivers in this section?


 Yes, that wording could be improved; will update the patch in the morning.


Hmm .. on second thought, I think 'required' is sufficiently clear on
its own that it applies only to this section. Other nearby sections
(USB, sound) use the word also.

For the time being, I still intend to add VirtIO only to i386 and
amd64 GENERIC. ARM and PPC64 can join the club later once I have a
chance to test/debug them on QEMU.

I'd like to commit this this weekend if nobody raises any objections.

 -Jim

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: VirtIO in GENERIC

2012-12-17 Thread Bryan Venteicher
On Mon, Dec 17, 2012 at 12:06 AM, Andrew Thompson thom...@freebsd.org wrote:
 On 17 December 2012 18:06, Jim Harris jimhar...@freebsd.org wrote:



 On Sun, Dec 16, 2012 at 6:53 PM, Andrew Thompson thom...@freebsd.org
 wrote:

 On 17 December 2012 13:17, Bryan Venteicher bry...@freebsd.org wrote:

  There's been lots of requests to have VirtIO in GENERIC for i386 and
  amd64. Anybody have any issues or concerns with this or the patch at
  [1]. This also removes the kludge that was introduced in r239009.
 
  I've compiled LINT for i386 and amd64 so hopefully there won't be any
  surprise breakages.
 
  [1] http://people.freebsd.org/~bryanv/patches/virtio.generic.patch


 It would be great to have the drivers enabled. You do not need the
 sys/conf/files changes, the common and arch files are combined.


 Removing the virtio files from sys/conf/files ensures these drivers can
 only be specified in x86 kernel configuration files.  r239009 added these
 lines to sys/conf/files, but Bryan's patch does it more correctly.



Yes, I think the patch is correct for what I intended - support for
x86 only (for now).

 Linux supports virtio on ARM so I dont think its necessarily x86 MD. I guess
 it can be moved back later.


I think VirtIO on ARM (on QEMU) effectively requires VirtIO-MMIO,
which we don't support yet. And virtio_pci is probably missing some
bus_space_barriers() required for non-x86. Both are on my TODO, but
nobody has prodded me about either yet.


 Andrew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


VirtIO in GENERIC

2012-12-16 Thread Bryan Venteicher
There's been lots of requests to have VirtIO in GENERIC for i386 and
amd64. Anybody have any issues or concerns with this or the patch at
[1]. This also removes the kludge that was introduced in r239009.

I've compiled LINT for i386 and amd64 so hopefully there won't be any
surprise breakages.

[1] http://people.freebsd.org/~bryanv/patches/virtio.generic.patch
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: VirtIO in GENERIC

2012-12-16 Thread Bryan Venteicher
On Sun, Dec 16, 2012 at 11:06 PM, Jim Harris jimhar...@freebsd.org wrote:


 On Sun, Dec 16, 2012 at 6:53 PM, Andrew Thompson thom...@freebsd.org
 wrote:

 On 17 December 2012 13:17, Bryan Venteicher bry...@freebsd.org wrote:

  There's been lots of requests to have VirtIO in GENERIC for i386 and
  amd64. Anybody have any issues or concerns with this or the patch at
  [1]. This also removes the kludge that was introduced in r239009.
 
  I've compiled LINT for i386 and amd64 so hopefully there won't be any
  surprise breakages.
 
  [1] http://people.freebsd.org/~bryanv/patches/virtio.generic.patch


 It would be great to have the drivers enabled. You do not need the
 sys/conf/files changes, the common and arch files are combined.


 Removing the virtio files from sys/conf/files ensures these drivers can only
 be specified in x86 kernel configuration files.  r239009 added these lines
 to sys/conf/files, but Bryan's patch does it more correctly.

 The only question I have is the GENERIC changes where device virtio is
 added - it says it is required, but should this instead say it's required
 for any of the other drivers in this section?


Yes, that wording could be improved; will update the patch in the morning.

 -Jim

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [head tinderbox] failure on i386/i386

2012-10-12 Thread Bryan Venteicher
Hi,

- Original Message -
 From: FreeBSD Tinderbox tinder...@freebsd.org
 To: FreeBSD Tinderbox tinder...@freebsd.org, curr...@freebsd.org, 
 i...@freebsd.org
 Sent: Friday, October 12, 2012 6:11:27 AM
 Subject: [head tinderbox] failure on i386/i386
 
 TB --- 2012-10-12 04:50:01 - tinderbox 2.9 running on
 freebsd-current.sentex.ca
 TB --- 2012-10-12 04:50:01 - FreeBSD freebsd-current.sentex.ca
 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #0: Mon Mar 26 13:54:12 EDT
 2012 d...@freebsd-current.sentex.ca:/usr/obj/usr/src/sys/GENERIC
  amd64
 TB --- 2012-10-12 04:50:01 - starting HEAD tinderbox run for
 i386/i386
 TB --- 2012-10-12 04:50:01 - cleaning the object tree
 TB --- 2012-10-12 04:50:01 - checking out /src from
 svn://svn.freebsd.org/base/head
 TB --- 2012-10-12 04:50:01 - cd /tinderbox/HEAD/i386/i386
 TB --- 2012-10-12 04:50:01 - /usr/local/bin/svn cleanup /src
 TB --- 2012-10-12 04:53:23 - /usr/local/bin/svn update /src
 TB --- 2012-10-12 04:53:42 - At svn revision 241478

[SNIP]

 TB --- 2012-10-12 10:54:26 - /usr/bin/make -B buildkernel
 KERNCONF=XEN
  Kernel build for XEN started on Fri Oct 12 10:54:26 UTC 2012
  stage 1: configuring the kernel
  stage 2.1: cleaning up the object tree
  stage 2.2: rebuilding the object tree
  stage 2.3: build tools
  stage 3.1: making dependencies
  stage 3.2: building everything
 [...]
 objcopy --only-keep-debug virtio_balloon.ko.debug
 virtio_balloon.ko.symbols
 objcopy --strip-debug --add-gnu-debuglink=virtio_balloon.ko.symbols
 virtio_balloon.ko.debug virtio_balloon.ko
 === virtio/scsi (all)
 cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE
 -nostdinc   -DHAVE_KERNEL_OPTION_HEADERS -include
 /obj/i386.i386/src/sys/XEN/opt_global.h -I. -I@ -I@/contrib/altq
 -finline-limit=8000 --param inline-unit-growth=100 --param
 large-function-growth=1000 -fno-common -g
 -I/obj/i386.i386/src/sys/XEN  -mno-align-long-strings
 -mpreferred-stack-boundary=2 -mno-mmx -mno-sse -msoft-float
 -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector
 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes
  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef
 -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs
 -fdiagnostics-show-option   -c
 /src/sys/modules/virtio/scsi/../../../dev/virtio/scsi/virtio_scsi.c
 cc1: warnings being treated as errors
 /src/sys/modules/virtio/scsi/../../../dev/virtio/scsi/virtio_scsi.c:
 In function 'vtscsi_sg_append_scsi_buf':
 /src/sys/modules/virtio/scsi/../../../dev/virtio/scsi/virtio_scsi.c:974:
 warning: cast from pointer to integer of different size
 [-Wpointer-to-int-cast]
 /src/sys/modules/virtio/scsi/../../../dev/virtio/scsi/virtio_scsi.c:982:
 warning: cast to pointer from integer of different size
 [-Wint-to-pointer-cast]
 *** [virtio_scsi.o] Error code 1

I cannot seem to recreate this locally, but I think these need
to be casted through uintptr?

diff --git a/sys/dev/virtio/scsi/virtio_scsi.c 
b/sys/dev/virtio/scsi/virtio_scsi.c
index f2e1412..79bc988 100644
--- a/sys/dev/virtio/scsi/virtio_scsi.c
+++ b/sys/dev/virtio/scsi/virtio_scsi.c
@@ -971,7 +971,7 @@ vtscsi_sg_append_scsi_buf(struct vtscsi_softc *sc, struct 
sglist *sg,
csio-data_ptr, csio-dxfer_len);
else
error = sglist_append_phys(sg,
-   (vm_paddr_t) csio-data_ptr, csio-dxfer_len);
+   (vm_paddr_t)(uintptr_t) csio-data_ptr, 
csio-dxfer_len);
} else {
 
for (i = 0; i  csio-sglist_cnt  error == 0; i++) {
@@ -979,7 +979,7 @@ vtscsi_sg_append_scsi_buf(struct vtscsi_softc *sc, struct 
sglist *sg,
 
if ((ccbh-flags  CAM_SG_LIST_PHYS) == 0)
error = sglist_append(sg,
-   (void *) dseg-ds_addr, dseg-ds_len);
+   (void *)(uintptr_t) dseg-ds_addr, 
dseg-ds_len);
else
error = sglist_append_phys(sg,
(vm_paddr_t) dseg-ds_addr, dseg-ds_len);

That being said, compiling VirtIO for a XEN kernel probably
doesn't make any sense.

Bryan

 
 Stop in /src/sys/modules/virtio/scsi.
 *** [all] Error code 1
 
 Stop in /src/sys/modules/virtio.
 *** [all] Error code 1
 
 Stop in /src/sys/modules.
 *** [modules-all] Error code 1
 
 Stop in /obj/i386.i386/src/sys/XEN.
 *** [buildkernel] Error code 1
 
 Stop in /src.
 *** Error code 1
 
 Stop in /src.
 TB --- 2012-10-12 11:11:27 - WARNING: /usr/bin/make returned exit
 code  1
 TB --- 2012-10-12 11:11:27 - ERROR: failed to build XEN kernel
 TB --- 2012-10-12 11:11:27 - 17474.50 user 2374.09 system 22886.60
 real
 
 
 http://tinderbox.freebsd.org/tinderbox-head-HEAD-i386-i386.full
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any 

Re: Awful FreeBSD 9 block IO performance in KVM

2012-07-22 Thread Bryan Venteicher


- Original Message -
 From: Dieter BSD dieter...@engineer.com
 To: hack...@freebsd.org, curr...@freebsd.org
 Sent: Sunday, July 22, 2012 1:19:32 AM
 Subject: Re: Awful FreeBSD 9 block IO performance in KVM
 
  da0: 3.300MB/s transfers
  da0: Command Queueing enabled
 
  root@freebsd:/root # dd if=/dev/zero of=/dev/da1 bs=16384
  count=262144
 
  4294967296 bytes transferred in 615.840721 secs (6974153 bytes/sec)
 
 1) Does a larger block size (bs=1m) help?
 
 2) That's roughly the speed I'd expect without queueing. Is it really
 making effective use of queueing, or is something limiting queueing to
 one transfer at a time?

The likely fix here is basically do vtblk_startio() in a separate
kproc that vtblk_strategy() enqueues bio's to. This has been on my
todo for a while, but haven't had the time. Also, the use of
bioq_disksort() probably doesn't gain much for virtualized disks,
but I never found much of a difference in my testing.

 ___
 freebsd-hack...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to
 freebsd-hackers-unsubscr...@freebsd.org
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


deadlkres() panic

2010-07-01 Thread Bryan Venteicher
On a recent -current, I got the following panic from deadlkres:

Assertion wchan != NULL failed at /usr/src-nfs/sys/kern/subr_sleepqueue.c:680

Tracing pid 0 tid 100058 td 0xff00024bf7a0
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x176
sleepq_type() at sleepq_type+0x56
deadlkres() at deadlkres+0x224
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8074976d30, rbp = 0 ---
(Hand transcribed, doadump() hung)

deadlkres() came across a TD_IS_SLEEPING()'ing thread that was not a
sleepqueue (ie, td-td_wchan == NULL).

I don't think this is an invalid state for thread to be in: After adding itself
to a sleepq and setting a timeout, the thread calls sleepq_timedwait_sig().
sleepq_catch_signals() determines there is a signal pending so it removes the
thread from the sleepq via sleepq_resume_thread(). Returning to
sleepq_timedwait_sig(), in the call to sleepq_check_timeout(), the thread is
unable to cancel the timeout because it is already firing (likely waiting on
thread_lock()). So the thread calls TD_SET_SLEEPING() followed by mi_switch().
deadlkres() then picks up thread_lock(), finding td is TD_IS_SLEEPING() 
!TD_ON_SLEEPQ().

The attached patch takes care of the panic for me.--- /usr/src-nfs/sys/kern/kern_clock.c	2010-06-30 03:38:25.0 -0500
+++ kern_clock.c	2010-07-01 02:19:39.048697991 -0500
@@ -232,7 +232,8 @@
 	panic(%s: possible deadlock detected for %p, blocked for %d ticks\n,
 		__func__, td, tticks);
 	}
-} else if (TD_IS_SLEEPING(td)) {
+} else if (TD_IS_SLEEPING(td) 
+TD_ON_SLEEPQ(td)) {
 
 	/* Handle ticks wrap-up. */
 	if (ticks  td-td_blktick) {
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org