Re: Call for testing: VM bugs in 10.3

2016-08-25 Thread Cedric Berger
Hello

I’ve run with this patch applied on two servers, on top of FreeBSD 10.3:

  1) On a fast real server which was not experiencing the problem 
(skylake+c236+nvme)

  2) On a slow virtual server (ESXi 5.5, running on 6 year old hardware) which 
I think 
  was experiencing this problem when our java apps was creating threads.
  (typically at java startup or sometimes after a few weeks of usage).

dmesg follows. I’ve not seen any problem so far with these 2 servers in 2 weeks.

It would be great if you could apply that patch with an official 10.3 EN. 

Thanks,
Cedric
ced...@precidata.com

Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.3-RELEASE-p6x #4: Mon Aug  8 17:08:40 CEST 2016

r...@ne-6.precidata.com:/usr/obj/svr/build/system/p15devel/src/sys/DATACENTER 
amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
VT(efifb): resolution 800x600
CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz (3600.15-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x506e3  Family=0x6  Model=0x5e  Stepping=3
  
Features=0xbfebfbff
  
Features2=0x7ffafbff
  AMD Features=0x2c100800
  AMD Features2=0x121
  Structured Extended 
Features=0x29c6fbb
  XSAVE Features=0xf
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 34359738368 (32768 MB)
avail memory = 32925085696 (31399 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
random:  initialized
ioapic0  irqs 0-119 on motherboard
module_register_init: MOD_LOAD (vesa, 0x80e315d0, 0) error 19
kbd1 at kbdmux0
cryptosoft0:  on motherboard
acpi0:  on motherboard
ACPI Error: [\134_SB_.PCI0.XHC_.RHUB.HS11] Namespace lookup failure, 
AE_NOT_FOUND (20150515/dswload-219)
ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20150515/psobject-233)
acpi0: Power Button (fixed)
cpu0:  on acpi0
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
cpu4:  on acpi0
cpu5:  on acpi0
cpu6:  on acpi0
cpu7:  on acpi0
hpet0:  iomem 0xfed0-0xfed003ff on acpi0
Timecounter "HPET" frequency 2400 Hz quality 950
Event timer "HPET" frequency 2400 Hz quality 550
atrtc0:  port 0x70-0x77 irq 8 on acpi0
atrtc0: Warning: Couldn't map I/O.
Event timer "RTC" frequency 32768 Hz quality 0
attimer0:  port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  irq 16 at device 1.0 on pci0
pci1:  on pcib1
nvme0:  mem 0xdf11-0xdf113fff irq 16 at device 
0.0 on pci1
vgapci0:  port 0xf000-0xf03f mem 
0xde00-0xdeff,0xc000-0xcfff irq 16 at device 2.0 on pci0
vgapci0: Boot video device
xhci0:  mem 0xdf23-0xdf23 irq 16 at 
device 20.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
usbus0: waiting for BIOS to give up control
usbus0 on xhci0
pci0:  at device 22.0 (no driver attached)
pci0:  at device 22.3 (no driver attached)
ahci0:  port 0xf090-0xf097,0xf080-0xf083,0xf060-0xf07f 
mem 0xdf248000-0xdf249fff,0xdf24c000-0xdf24c0ff,0xdf24b000-0xdf24b7ff irq 16 at 
device 23.0 on pci0
ahci0: AHCI v1.31 with 8 6Gbps ports, Port Multiplier not supported
ahcich0:  at channel 0 on ahci0
ahcich1:  at channel 1 on ahci0
ahcich2:  at channel 2 on ahci0
ahcich3:  at channel 3 on ahci0
ahcich4:  at channel 4 on ahci0
ahcich5:  at channel 5 on ahci0
ahcich6:  at channel 6 on ahci0
ahcich7:  at channel 7 on ahci0
ahciem0:  on ahci0
pcib2:  irq 16 at device 28.0 on pci0
pci2:  on pcib2
pcib3:  irq 19 at device 28.7 on pci0
pci3:  on pcib3
igb0:  port 
0xe000-0xe01f mem 0xdf00-0xdf07,0xdf08-0xdf083fff irq 19 at device 
0.0 on pci3
igb0: Using MSIX interrupts with 5 vectors
igb0: Ethernet address: d0:50:99:c0:b7:0a
igb0: Bound queue 0 to cpu 0
igb0: Bound queue 1 to cpu 1
igb0: Bound queue 2 to 

Re: Call for testing: VM bugs in 10.3

2016-08-17 Thread Andrea Venturoli

On 08/02/16 21:25, Konstantin Belousov wrote:

Below is the merge of some high-profile virtual memory subsystem bug
fixes from stable/10 to 10.3. I merged fixes for bugs reported by
users, issues which are even theoretically unlikely to occur in real
world loads, are not included into the patch set. The later is mostly
corrections for the handling of radix insertion failures. Included fixes
are for random SIGSEGV delivered to processes, hangs on "vodead" state
on filesystem operations, and several others.

List of the merged revisions:
r301184 prevent parallel object collapses, fixes object lifecycle
r301436 do not leak the vm object lock, fixes overcommit disable
r302243 avoid the active object marking for vm.vmtotal sysctl, fixes
"vodead" hangs
r302513 vm_fault() race with the vm_object_collapse(), fixes spurious SIGSEGV
r303291 postpone BO_DEAD, fixes panic on fast vnode reclaim

I am asking for some testing, it is not necessary for your system to
exhibit the problematic behaviour for your testing to be useful. I am
more looking for smoke-testing kind of confirmation that patch is fine.
Neither I nor people who usually help me with testing,  run 10.3 systems.

If everything appear to be fine, my intent is to ask re/so to issue
Errata Notice with these changes in about a week from now.


I upgraded a 10.3/amd64 system which in fact was showing some possibly 
related troubles.


So far so good, since I haven't had any problem: altough it's close to 
impossible to deterministically reproduce the locks I've had, I saw no 
regression so far.


I plan to upgrade other boxes in some weeks.

 bye & Thanks
av.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Call for testing: VM bugs in 10.3

2016-08-16 Thread lantan pig
I tested the patch on a 10.3-RELEASE-p7 system and so far no problems for
last 2 days.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Call for testing: VM bugs in 10.3

2016-08-07 Thread Andrea Venturoli

On 08/02/16 21:25, Konstantin Belousov wrote:

Below is the merge of some high-profile virtual memory subsystem bug
fixes from stable/10 to 10.3. I merged fixes for bugs reported by
users, issues which are even theoretically unlikely to occur in real
world loads, are not included into the patch set. The later is mostly
corrections for the handling of radix insertion failures. Included fixes
are for random SIGSEGV delivered to processes, hangs on "vodead" state
on filesystem operations, and several others.

List of the merged revisions:
r301184 prevent parallel object collapses, fixes object lifecycle
r301436 do not leak the vm object lock, fixes overcommit disable
r302243 avoid the active object marking for vm.vmtotal sysctl, fixes
"vodead" hangs
r302513 vm_fault() race with the vm_object_collapse(), fixes spurious SIGSEGV
r303291 postpone BO_DEAD, fixes panic on fast vnode reclaim

I am asking for some testing, it is not necessary for your system to
exhibit the problematic behaviour for your testing to be useful. I am
more looking for smoke-testing kind of confirmation that patch is fine.
Neither I nor people who usually help me with testing,  run 10.3 systems.

If everything appear to be fine, my intent is to ask re/so to issue
Errata Notice with these changes in about a week from now.


Hello and thanks for your work.

Has this anything to do with

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764

?

 bye & Thanks
av.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Call for testing: VM bugs in 10.3

2016-08-02 Thread Konstantin Belousov
On Tue, Aug 02, 2016 at 01:57:49PM -0600, Ian Lepore wrote:
> On Tue, 2016-08-02 at 22:25 +0300, Konstantin Belousov wrote:
> > Below is the merge of some high-profile virtual memory subsystem bug
> > fixes from stable/10 to 10.3.

> I run 10-stable on my everyday desktop/build machine, but my mail
> client ruined the format of the patches.  Can I just 'svn up' on the 10
> -stable branch and then MFC the revs you list above, or are there hand
> -tweaks to the patches you attached?
If you svn up stable/10, or if your existing sources are already past
r303291, then you already have that patches.

The first sentence of my mail stated that the backport is from stable/10
to 10.3.

Anyway, I put the patch at https://kib.kiev.ua/kib/vm-10.3-bp.1.patch .
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Call for testing: VM bugs in 10.3

2016-08-02 Thread Konstantin Belousov
On Tue, Aug 02, 2016 at 12:47:23PM -0700, pete wright wrote:
> On Aug 2, 2016 12:26 PM, "Konstantin Belousov"  wrote:
> >
> > Below is the merge of some high-profile virtual memory subsystem bug
> > fixes from stable/10 to 10.3. I merged fixes for bugs reported by
> > users, issues which are even theoretically unlikely to occur in real
> > world loads, are not included into the patch set. The later is mostly
> > corrections for the handling of radix insertion failures. Included fixes
> > are for random SIGSEGV delivered to processes, hangs on "vodead" state
> > on filesystem operations, and several others.
> >
> > List of the merged revisions:
> > r301184 prevent parallel object collapses, fixes object lifecycle
> > r301436 do not leak the vm object lock, fixes overcommit disable
> > r302243 avoid the active object marking for vm.vmtotal sysctl, fixes
> > "vodead" hangs
> > r302513 vm_fault() race with the vm_object_collapse(), fixes spurious
> SIGSEGV
> > r303291 postpone BO_DEAD, fixes panic on fast vnode reclaim
> >
> > I am asking for some testing, it is not necessary for your system to
> > exhibit the problematic behaviour for your testing to be useful. I am
> > more looking for smoke-testing kind of confirmation that patch is fine.
> > Neither I nor people who usually help me with testing,  run 10.3 systems.
> >
> 
> Is testing on 10.3-RELEASE useful, or is this only for people tracking
> STABLE?
This is only for people running 10.3.  The list of merged revisions is from
stable/10, where the fixes were already merged for month or more.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Call for testing: VM bugs in 10.3

2016-08-02 Thread Ian Lepore
On Tue, 2016-08-02 at 22:25 +0300, Konstantin Belousov wrote:
> Below is the merge of some high-profile virtual memory subsystem bug
> fixes from stable/10 to 10.3. I merged fixes for bugs reported by
> users, issues which are even theoretically unlikely to occur in real
> world loads, are not included into the patch set. The later is mostly
> corrections for the handling of radix insertion failures. Included
> fixes
> are for random SIGSEGV delivered to processes, hangs on "vodead"
> state
> on filesystem operations, and several others.
> 
> List of the merged revisions:
> r301184 prevent parallel object collapses, fixes object lifecycle
> r301436 do not leak the vm object lock, fixes overcommit disable
> r302243 avoid the active object marking for vm.vmtotal sysctl, fixes
>   "vodead" hangs
> r302513 vm_fault() race with the vm_object_collapse(), fixes spurious
> SIGSEGV
> r303291 postpone BO_DEAD, fixes panic on fast vnode reclaim
> 
> I am asking for some testing, it is not necessary for your system to
> exhibit the problematic behaviour for your testing to be useful. I am
> more looking for smoke-testing kind of confirmation that patch is
> fine.
> Neither I nor people who usually help me with testing,  run 10.3
> systems.
> 
> If everything appear to be fine, my intent is to ask re/so to issue
> Errata Notice with these changes in about a week from now.
> 
> Index: sys/kern/vfs_subr.c
> [...]

I run 10-stable on my everyday desktop/build machine, but my mail
client ruined the format of the patches.  Can I just 'svn up' on the 10
-stable branch and then MFC the revs you list above, or are there hand
-tweaks to the patches you attached?

-- Ian

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Call for testing: VM bugs in 10.3

2016-08-02 Thread pete wright
On Aug 2, 2016 12:26 PM, "Konstantin Belousov"  wrote:
>
> Below is the merge of some high-profile virtual memory subsystem bug
> fixes from stable/10 to 10.3. I merged fixes for bugs reported by
> users, issues which are even theoretically unlikely to occur in real
> world loads, are not included into the patch set. The later is mostly
> corrections for the handling of radix insertion failures. Included fixes
> are for random SIGSEGV delivered to processes, hangs on "vodead" state
> on filesystem operations, and several others.
>
> List of the merged revisions:
> r301184 prevent parallel object collapses, fixes object lifecycle
> r301436 do not leak the vm object lock, fixes overcommit disable
> r302243 avoid the active object marking for vm.vmtotal sysctl, fixes
> "vodead" hangs
> r302513 vm_fault() race with the vm_object_collapse(), fixes spurious
SIGSEGV
> r303291 postpone BO_DEAD, fixes panic on fast vnode reclaim
>
> I am asking for some testing, it is not necessary for your system to
> exhibit the problematic behaviour for your testing to be useful. I am
> more looking for smoke-testing kind of confirmation that patch is fine.
> Neither I nor people who usually help me with testing,  run 10.3 systems.
>

Is testing on 10.3-RELEASE useful, or is this only for people tracking
STABLE?

Thanks!
-pete
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Call for testing: VM bugs in 10.3

2016-08-02 Thread Konstantin Belousov
Below is the merge of some high-profile virtual memory subsystem bug
fixes from stable/10 to 10.3. I merged fixes for bugs reported by
users, issues which are even theoretically unlikely to occur in real
world loads, are not included into the patch set. The later is mostly
corrections for the handling of radix insertion failures. Included fixes
are for random SIGSEGV delivered to processes, hangs on "vodead" state
on filesystem operations, and several others.

List of the merged revisions:
r301184 prevent parallel object collapses, fixes object lifecycle
r301436 do not leak the vm object lock, fixes overcommit disable
r302243 avoid the active object marking for vm.vmtotal sysctl, fixes
"vodead" hangs
r302513 vm_fault() race with the vm_object_collapse(), fixes spurious SIGSEGV
r303291 postpone BO_DEAD, fixes panic on fast vnode reclaim

I am asking for some testing, it is not necessary for your system to
exhibit the problematic behaviour for your testing to be useful. I am
more looking for smoke-testing kind of confirmation that patch is fine.
Neither I nor people who usually help me with testing,  run 10.3 systems.

If everything appear to be fine, my intent is to ask re/so to issue
Errata Notice with these changes in about a week from now.

Index: sys/kern/vfs_subr.c
===
--- sys/kern/vfs_subr.c (revision 303659)
+++ sys/kern/vfs_subr.c (working copy)
@@ -2934,7 +2934,13 @@ vgonel(struct vnode *vp)
TAILQ_EMPTY(>v_bufobj.bo_clean.bv_hd) &&
vp->v_bufobj.bo_clean.bv_cnt == 0,
("vp %p bufobj not invalidated", vp));
-   vp->v_bufobj.bo_flag |= BO_DEAD;
+
+   /*
+* For VMIO bufobj, BO_DEAD is set in vm_object_terminate()
+* after the object's page queue is flushed.
+*/
+   if (vp->v_bufobj.bo_object == NULL)
+   vp->v_bufobj.bo_flag |= BO_DEAD;
BO_UNLOCK(>v_bufobj);
 
/*
Index: sys/vm/vm_fault.c
===
--- sys/vm/vm_fault.c   (revision 303659)
+++ sys/vm/vm_fault.c   (working copy)
@@ -286,7 +286,7 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_
vm_prot_t prot;
long ahead, behind;
int alloc_req, era, faultcount, nera, reqpage, result;
-   boolean_t growstack, is_first_object_locked, wired;
+   boolean_t dead, growstack, is_first_object_locked, wired;
int map_generation;
vm_object_t next_object;
vm_page_t marray[VM_FAULT_READ_MAX];
@@ -423,11 +423,18 @@ fast_failed:
fs.pindex = fs.first_pindex;
while (TRUE) {
/*
-* If the object is dead, we stop here
+* If the object is marked for imminent termination,
+* we retry here, since the collapse pass has raced
+* with us.  Otherwise, if we see terminally dead
+* object, return fail.
 */
-   if (fs.object->flags & OBJ_DEAD) {
+   if ((fs.object->flags & OBJ_DEAD) != 0) {
+   dead = fs.object->type == OBJT_DEAD;
unlock_and_deallocate();
-   return (KERN_PROTECTION_FAILURE);
+   if (dead)
+   return (KERN_PROTECTION_FAILURE);
+   pause("vmf_de", 1);
+   goto RetryFault;
}
 
/*
Index: sys/vm/vm_meter.c
===
--- sys/vm/vm_meter.c   (revision 303659)
+++ sys/vm/vm_meter.c   (working copy)
@@ -93,30 +93,32 @@ SYSCTL_PROC(_vm, VM_LOADAVG, loadavg, CTLTYPE_STRU
 CTLFLAG_MPSAFE, NULL, 0, sysctl_vm_loadavg, "S,loadavg",
 "Machine loadaverage history");
 
+/*
+ * This function aims to determine if the object is mapped,
+ * specifically, if it is referenced by a vm_map_entry.  Because
+ * objects occasionally acquire transient references that do not
+ * represent a mapping, the method used here is inexact.  However, it
+ * has very low overhead and is good enough for the advisory
+ * vm.vmtotal sysctl.
+ */
+static bool
+is_object_active(vm_object_t obj)
+{
+
+   return (obj->ref_count > obj->shadow_count);
+}
+
 static int
 vmtotal(SYSCTL_HANDLER_ARGS)
 {
-   struct proc *p;
struct vmtotal total;
-   vm_map_entry_t entry;
vm_object_t object;
-   vm_map_t map;
-   int paging;
+   struct proc *p;
struct thread *td;
-   struct vmspace *vm;
 
bzero(, sizeof(total));
+
/*
-* Mark all objects as inactive.
-*/
-   mtx_lock(_object_list_mtx);
-   TAILQ_FOREACH(object, _object_list, object_list) {
-   VM_OBJECT_WLOCK(object);
-   vm_object_clear_flag(object, OBJ_ACTIVE);
-   VM_OBJECT_WUNLOCK(object);
-   }
-   mtx_unlock(_object_list_mtx);
-   /*
 *