Re: nginx and FreeBSD11

2016-09-26 Thread Slawa Olhovchenkov
On Mon, Sep 26, 2016 at 06:20:42PM +0300, Konstantin Belousov wrote:

> On Thu, Sep 22, 2016 at 12:33:55PM +0300, Slawa Olhovchenkov wrote:
> > OK, try this patch.
> 
> Was the patch tested ?

No more AIO related issused/nginx core dumps.
I Can't get long uptime by other issuses (tcp locks and mbuf related)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-26 Thread Konstantin Belousov
On Thu, Sep 22, 2016 at 12:33:55PM +0300, Slawa Olhovchenkov wrote:
> OK, try this patch.

Was the patch tested ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-22 Thread Konstantin Belousov
On Thu, Sep 22, 2016 at 12:33:55PM +0300, Slawa Olhovchenkov wrote:
> Do you still need first 100 lines from verbose boot?
No.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-22 Thread Slawa Olhovchenkov
On Thu, Sep 22, 2016 at 11:53:20AM +0300, Konstantin Belousov wrote:

> On Thu, Sep 22, 2016 at 11:34:24AM +0300, Slawa Olhovchenkov wrote:
> > On Thu, Sep 22, 2016 at 11:27:40AM +0300, Konstantin Belousov wrote:
> > 
> > > On Thu, Sep 22, 2016 at 11:25:27AM +0300, Slawa Olhovchenkov wrote:
> > > > On Thu, Sep 22, 2016 at 10:59:33AM +0300, Konstantin Belousov wrote:
> > > > > Below is, I believe, the committable fix, of course supposing that
> > > > > the patch above worked. If you want to retest it on stable/11, ignore
> > > > > efirt.c chunks.
> > > > 
> > > > and remove patch w/ spinlock?
> > > Yes.
> > 
> > What you prefer now -- I am test spinlock patch or this patch?
> > For success in any case need wait 2-3 days.
> 
> If you already run previous (spinlock) version for 1 day, then finish
> with it. I am confident that spinlock version results are indicative for
> the refined patch as well.
> 
> If you did not applied the spinlock variant at all, there is no reason to
> spend efforts on it, use the patch I sent today.

No, I am did not applied the spinlock variant at all.
OK, try this patch.
Do you still need first 100 lines from verbose boot?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-22 Thread Konstantin Belousov
On Thu, Sep 22, 2016 at 11:34:24AM +0300, Slawa Olhovchenkov wrote:
> On Thu, Sep 22, 2016 at 11:27:40AM +0300, Konstantin Belousov wrote:
> 
> > On Thu, Sep 22, 2016 at 11:25:27AM +0300, Slawa Olhovchenkov wrote:
> > > On Thu, Sep 22, 2016 at 10:59:33AM +0300, Konstantin Belousov wrote:
> > > > Below is, I believe, the committable fix, of course supposing that
> > > > the patch above worked. If you want to retest it on stable/11, ignore
> > > > efirt.c chunks.
> > > 
> > > and remove patch w/ spinlock?
> > Yes.
> 
> What you prefer now -- I am test spinlock patch or this patch?
> For success in any case need wait 2-3 days.

If you already run previous (spinlock) version for 1 day, then finish
with it. I am confident that spinlock version results are indicative for
the refined patch as well.

If you did not applied the spinlock variant at all, there is no reason to
spend efforts on it, use the patch I sent today.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-22 Thread Slawa Olhovchenkov
On Thu, Sep 22, 2016 at 11:27:40AM +0300, Konstantin Belousov wrote:

> On Thu, Sep 22, 2016 at 11:25:27AM +0300, Slawa Olhovchenkov wrote:
> > On Thu, Sep 22, 2016 at 10:59:33AM +0300, Konstantin Belousov wrote:
> > > Below is, I believe, the committable fix, of course supposing that
> > > the patch above worked. If you want to retest it on stable/11, ignore
> > > efirt.c chunks.
> > 
> > and remove patch w/ spinlock?
> Yes.

What you prefer now -- I am test spinlock patch or this patch?
For success in any case need wait 2-3 days.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-22 Thread Konstantin Belousov
On Thu, Sep 22, 2016 at 11:25:27AM +0300, Slawa Olhovchenkov wrote:
> On Thu, Sep 22, 2016 at 10:59:33AM +0300, Konstantin Belousov wrote:
> > Below is, I believe, the committable fix, of course supposing that
> > the patch above worked. If you want to retest it on stable/11, ignore
> > efirt.c chunks.
> 
> and remove patch w/ spinlock?
Yes.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-22 Thread Slawa Olhovchenkov
On Thu, Sep 22, 2016 at 10:59:33AM +0300, Konstantin Belousov wrote:

> On Wed, Sep 21, 2016 at 12:15:17AM +0300, Konstantin Belousov wrote:
> > > > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> > > > index a23468e..f754652 100644
> > > > --- a/sys/vm/vm_map.c
> > > > +++ b/sys/vm/vm_map.c
> > > > @@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
> > > > if (oldvm == newvm)
> > > > return;
> > > >  
> > > > +   spinlock_enter();
> > > > /*
> > > >  * Point to the new address space and refer to it.
> > > >  */
> > > > @@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
> > > >  
> > > > /* Activate the new mapping. */
> > > > pmap_activate(curthread);
> > > > +   spinlock_exit();
> > > >  
> > > > /* Remove the daemon's reference to the old address space. */
> > > > KASSERT(oldvm->vm_refcnt > 1,
> Did you tested the patch ?

I am now installed it.
For success test need 2-3 days.
If test failed result may be quickly.

> Below is, I believe, the committable fix, of course supposing that
> the patch above worked. If you want to retest it on stable/11, ignore
> efirt.c chunks.

and remove patch w/ spinlock?

> diff --git a/sys/amd64/amd64/efirt.c b/sys/amd64/amd64/efirt.c
> index f1d67f7..c883af8 100644
> --- a/sys/amd64/amd64/efirt.c
> +++ b/sys/amd64/amd64/efirt.c
> @@ -53,6 +53,7 @@ __FBSDID("$FreeBSD$");
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -301,6 +302,17 @@ efi_enter(void)
>   PMAP_UNLOCK(curpmap);
>   return (error);
>   }
> +
> + /*
> +  * IPI TLB shootdown handler invltlb_pcid_handler() reloads
> +  * %cr3 from the curpmap->pm_cr3, which would disable runtime
> +  * segments mappings.  Block the handler's action by setting
> +  * curpmap to impossible value.  See also comment in
> +  * pmap.c:pmap_activate_sw().
> +  */
> + if (pmap_pcid_enabled && !invpcid_works)
> + PCPU_SET(curpmap, NULL);
> +
>   load_cr3(VM_PAGE_TO_PHYS(efi_pml4_page) | (pmap_pcid_enabled ?
>   curpmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid : 0));
>   /*
> @@ -317,7 +329,9 @@ efi_leave(void)
>  {
>   pmap_t curpmap;
>  
> - curpmap = PCPU_GET(curpmap);
> + curpmap = &curproc->p_vmspace->vm_pmap;
> + if (pmap_pcid_enabled && !invpcid_works)
> + PCPU_SET(curpmap, curpmap);
>   load_cr3(curpmap->pm_cr3 | (pmap_pcid_enabled ?
>   curpmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid : 0));
>   if (!pmap_pcid_enabled)
> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
> index 63042e4..59e1b67 100644
> --- a/sys/amd64/amd64/pmap.c
> +++ b/sys/amd64/amd64/pmap.c
> @@ -6842,6 +6842,7 @@ pmap_activate_sw(struct thread *td)
>  {
>   pmap_t oldpmap, pmap;
>   uint64_t cached, cr3;
> + register_t rflags;
>   u_int cpuid;
>  
>   oldpmap = PCPU_GET(curpmap);
> @@ -6865,16 +6866,43 @@ pmap_activate_sw(struct thread *td)
>   pmap == kernel_pmap,
>   ("non-kernel pmap thread %p pmap %p cpu %d pcid %#x",
>   td, pmap, cpuid, pmap->pm_pcids[cpuid].pm_pcid));
> +
> + /*
> +  * If the INVPCID instruction is not available,
> +  * invltlb_pcid_handler() is used for handle
> +  * invalidate_all IPI, which checks for curpmap ==
> +  * smp_tlb_pmap.  Below operations sequence has a
> +  * window where %CR3 is loaded with the new pmap's
> +  * PML4 address, but curpmap value is not yet updated.
> +  * This causes invltlb IPI handler, called between the
> +  * updates, to execute as NOP, which leaves stale TLB
> +  * entries.
> +  *
> +  * Note that the most typical use of
> +  * pmap_activate_sw(), from the context switch, is
> +  * immune to this race, because interrupts are
> +  * disabled (while the thread lock is owned), and IPI
> +  * happends after curpmap is updated.  Protect other
> +  * callers in a similar way, by disabling interrupts
> +  * around the %cr3 register reload and curpmap
> +  * assignment.
> +  */
> + if (!invpcid_works)
> + rflags = intr_disable();
> +
>   if (!cached || (cr3 & ~CR3_PCID_MASK) != pmap->pm_cr3) {
>   load_cr3(pmap->pm_cr3 | pmap->pm_pcids[cpuid].pm_pcid |
>   cached);
>   if (cached)
>   PCPU_INC(pm_save_cnt);
>   }
> + PCPU_SET(curpmap, pmap);
> + if (!invpcid_works)
> + intr_restore(rflags);
>   } else if (cr3 != pmap->pm_cr3) {
>   load_cr3(pmap->pm_cr3);
> + PCPU_SET(curpmap, pmap);
>   }
> -

Re: nginx and FreeBSD11

2016-09-22 Thread Konstantin Belousov
On Wed, Sep 21, 2016 at 12:15:17AM +0300, Konstantin Belousov wrote:
> > > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> > > index a23468e..f754652 100644
> > > --- a/sys/vm/vm_map.c
> > > +++ b/sys/vm/vm_map.c
> > > @@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
> > >   if (oldvm == newvm)
> > >   return;
> > >  
> > > + spinlock_enter();
> > >   /*
> > >* Point to the new address space and refer to it.
> > >*/
> > > @@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
> > >  
> > >   /* Activate the new mapping. */
> > >   pmap_activate(curthread);
> > > + spinlock_exit();
> > >  
> > >   /* Remove the daemon's reference to the old address space. */
> > >   KASSERT(oldvm->vm_refcnt > 1,
Did you tested the patch ?

Below is, I believe, the committable fix, of course supposing that
the patch above worked. If you want to retest it on stable/11, ignore
efirt.c chunks.

diff --git a/sys/amd64/amd64/efirt.c b/sys/amd64/amd64/efirt.c
index f1d67f7..c883af8 100644
--- a/sys/amd64/amd64/efirt.c
+++ b/sys/amd64/amd64/efirt.c
@@ -53,6 +53,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -301,6 +302,17 @@ efi_enter(void)
PMAP_UNLOCK(curpmap);
return (error);
}
+
+   /*
+* IPI TLB shootdown handler invltlb_pcid_handler() reloads
+* %cr3 from the curpmap->pm_cr3, which would disable runtime
+* segments mappings.  Block the handler's action by setting
+* curpmap to impossible value.  See also comment in
+* pmap.c:pmap_activate_sw().
+*/
+   if (pmap_pcid_enabled && !invpcid_works)
+   PCPU_SET(curpmap, NULL);
+
load_cr3(VM_PAGE_TO_PHYS(efi_pml4_page) | (pmap_pcid_enabled ?
curpmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid : 0));
/*
@@ -317,7 +329,9 @@ efi_leave(void)
 {
pmap_t curpmap;
 
-   curpmap = PCPU_GET(curpmap);
+   curpmap = &curproc->p_vmspace->vm_pmap;
+   if (pmap_pcid_enabled && !invpcid_works)
+   PCPU_SET(curpmap, curpmap);
load_cr3(curpmap->pm_cr3 | (pmap_pcid_enabled ?
curpmap->pm_pcids[PCPU_GET(cpuid)].pm_pcid : 0));
if (!pmap_pcid_enabled)
diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 63042e4..59e1b67 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -6842,6 +6842,7 @@ pmap_activate_sw(struct thread *td)
 {
pmap_t oldpmap, pmap;
uint64_t cached, cr3;
+   register_t rflags;
u_int cpuid;
 
oldpmap = PCPU_GET(curpmap);
@@ -6865,16 +6866,43 @@ pmap_activate_sw(struct thread *td)
pmap == kernel_pmap,
("non-kernel pmap thread %p pmap %p cpu %d pcid %#x",
td, pmap, cpuid, pmap->pm_pcids[cpuid].pm_pcid));
+
+   /*
+* If the INVPCID instruction is not available,
+* invltlb_pcid_handler() is used for handle
+* invalidate_all IPI, which checks for curpmap ==
+* smp_tlb_pmap.  Below operations sequence has a
+* window where %CR3 is loaded with the new pmap's
+* PML4 address, but curpmap value is not yet updated.
+* This causes invltlb IPI handler, called between the
+* updates, to execute as NOP, which leaves stale TLB
+* entries.
+*
+* Note that the most typical use of
+* pmap_activate_sw(), from the context switch, is
+* immune to this race, because interrupts are
+* disabled (while the thread lock is owned), and IPI
+* happends after curpmap is updated.  Protect other
+* callers in a similar way, by disabling interrupts
+* around the %cr3 register reload and curpmap
+* assignment.
+*/
+   if (!invpcid_works)
+   rflags = intr_disable();
+
if (!cached || (cr3 & ~CR3_PCID_MASK) != pmap->pm_cr3) {
load_cr3(pmap->pm_cr3 | pmap->pm_pcids[cpuid].pm_pcid |
cached);
if (cached)
PCPU_INC(pm_save_cnt);
}
+   PCPU_SET(curpmap, pmap);
+   if (!invpcid_works)
+   intr_restore(rflags);
} else if (cr3 != pmap->pm_cr3) {
load_cr3(pmap->pm_cr3);
+   PCPU_SET(curpmap, pmap);
}
-   PCPU_SET(curpmap, pmap);
 #ifdef SMP
CPU_CLR_ATOMIC(cpuid, &oldpmap->pm_active);
 #else
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-20 Thread Slawa Olhovchenkov
On Tue, Sep 20, 2016 at 04:00:10PM -0600, Warner Losh wrote:

> >> > > Is this sandy bridge ?
> >> >
> >> > Sandy Bridge EP
> >> >
> >> > > Show me first 100 lines of the verbose dmesg,
> >> >
> >> > After day or two, after end of this test run -- I am need to enable 
> >> > verbose.
> >> >
> >> > > I want to see cpu features lines.  In particular, does you CPU support
> >> > > the INVPCID feature.
> >> >
> >> > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.05-MHz K8-class CPU)
> >> >   Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
> >> >   
> >> > Features=0xbfebfbff
> >> >   
> >> > Features2=0x1fbee3ff
> >> >   AMD Features=0x2c100800
> >> >   AMD Features2=0x1
> >> >   XSAVE Features=0x1
> >> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
> >> >   TSC: P-state invariant, performance statistics
> >> >
> >> > I am don't see this feature before E5v3:
> >> >
> >> > CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU)
> >> >   Origin="GenuineIntel"  Id=0x306e4  Family=0x6  Model=0x3e  Stepping=4
> >> >   
> >> > Features=0xbfebfbff
> >> >   
> >> > Features2=0x7fbee3ff
> >> >   AMD Features=0x2c100800
> >> >   AMD Features2=0x1
> >> >   Structured Extended Features=0x281
> >> >   XSAVE Features=0x1
> >> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
> >> >   TSC: P-state invariant, performance statistics
> >> >
> >> > (don't run 11.0 on this CPU)
> >> Ok.
> >>
> >> >
> >> > CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (2600.05-MHz K8-class CPU)
> >> >   Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
> >> >   
> >> > Features=0xbfebfbff
> >> >   
> >> > Features2=0x7ffefbff
> >> >   AMD Features=0x2c100800
> >> >   AMD Features2=0x21
> >> >   Structured Extended 
> >> > Features=0x37ab
> >> >   XSAVE Features=0x1
> >> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
> >> >   TSC: P-state invariant, performance statistics
> >> >
> >> > (11.0 run w/o this issuse)
> >> Do you mean that similarly configured nginx+aio do not demonstrate the 
> >> corruption on this machine ?
> >
> > Yes.
> > But different storage configuration and different pattern load.
> >
> > Also 11.0 run w/o this issuse on
> >
> > CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (2200.04-MHz K8-class CPU)
> >   Origin="GenuineIntel"  Id=0x406f1  Family=0x6  Model=0x4f  Stepping=1
> >   
> > Features=0xbfebfbff
> >   
> > Features2=0x7ffefbff
> >   AMD Features=0x2c100800
> >   AMD Features2=0x121
> >   Structured Extended 
> > Features=0x21cbfbb
> >   XSAVE Features=0x1
> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
> >   TSC: P-state invariant, performance statistics
> >
> > PS: all systems is dual-cpu.
> 
> Does this mean 2 cores or two sockets? We've seen a similar hang with
> the following CPU:

two sockets. not sure how this impotant, just for record.
you system also w/o INVPCID feature (as kib question).
may be you case also will be resolved by vm.pmap.pcid_enabled=0?

> CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (2700.06-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x306e4  Family=0x6  Model=0x3e  Stepping=4
>  
> Features=0xbfebfbff
>  
> Features2=0x7fbee3ff
>   AMD Features=0x2c100800
>   AMD Features2=0x1
>   Structured Extended Features=0x281
>   XSAVE Features=0x1
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>   TSC: P-state invariant, performance statistics
> real memory  = 274877906944 (262144 MB)
> avail memory = 267146330112 (254770 MB)
> 
> 12 cores x 2 SMT x 1 socket
> 
> Warner
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-20 Thread Warner Losh
On Tue, Sep 20, 2016 at 3:47 PM, Slawa Olhovchenkov  wrote:
> On Wed, Sep 21, 2016 at 12:15:17AM +0300, Konstantin Belousov wrote:
>
>> On Tue, Sep 20, 2016 at 11:38:54PM +0300, Slawa Olhovchenkov wrote:
>> > On Tue, Sep 20, 2016 at 11:19:25PM +0300, Konstantin Belousov wrote:
>> >
>> > > On Tue, Sep 20, 2016 at 10:20:53PM +0300, Slawa Olhovchenkov wrote:
>> > > > On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa Olhovchenkov wrote:
>> > > >
>> > > > > On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin wrote:
>> > > > >
>> > > > > > > > If this panics, then vmspace_switch_aio() is not working for
>> > > > > > > > some reason.
>> > > > > > >
>> > > > > > > I am try using next DTrace script:
>> > > > > > > 
>> > > > > > > #pragma D option dynvarsize=64m
>> > > > > > >
>> > > > > > > int req[struct vmspace  *, void *];
>> > > > > > > self int trace;
>> > > > > > >
>> > > > > > > syscall:freebsd:aio_read:entry
>> > > > > > > {
>> > > > > > > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct 
>> > > > > > > aiocb));
>> > > > > > > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
>> > > > > > > curthread->td_proc->p_pid;
>> > > > > > > }
>> > > > > > >
>> > > > > > > fbt:kernel:aio_process_rw:entry
>> > > > > > > {
>> > > > > > > self->job = args[0];
>> > > > > > > self->trace = 1;
>> > > > > > > }
>> > > > > > >
>> > > > > > > fbt:kernel:aio_process_rw:return
>> > > > > > > /self->trace/
>> > > > > > > {
>> > > > > > > req[self->job->userproc->p_vmspace, 
>> > > > > > > self->job->uaiocb.aio_buf] = 0;
>> > > > > > > self->job = 0;
>> > > > > > > self->trace = 0;
>> > > > > > > }
>> > > > > > >
>> > > > > > > fbt:kernel:vn_io_fault:entry
>> > > > > > > /self->trace && !req[curthread->td_proc->p_vmspace, 
>> > > > > > > args[1]->uio_iov[0].iov_base]/
>> > > > > > > {
>> > > > > > > this->buf = args[1]->uio_iov[0].iov_base;
>> > > > > > > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
>> > > > > > > curthread->td_proc->p_vmspace, this->buf, 
>> > > > > > > req[curthread->td_proc->p_vmspace, this->buf]);
>> > > > > > > }
>> > > > > > > ===
>> > > > > > >
>> > > > > > > And don't got any messages near nginx core dump.
>> > > > > > > What I can check next?
>> > > > > > > May be check context/address space switch for kernel process?
>> > > > > >
>> > > > > > Which CPU are you using?
>> > > > >
>> > > > > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz K8-class 
>> > > > > CPU)
>> > > Is this sandy bridge ?
>> >
>> > Sandy Bridge EP
>> >
>> > > Show me first 100 lines of the verbose dmesg,
>> >
>> > After day or two, after end of this test run -- I am need to enable 
>> > verbose.
>> >
>> > > I want to see cpu features lines.  In particular, does you CPU support
>> > > the INVPCID feature.
>> >
>> > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.05-MHz K8-class CPU)
>> >   Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
>> >   
>> > Features=0xbfebfbff
>> >   
>> > Features2=0x1fbee3ff
>> >   AMD Features=0x2c100800
>> >   AMD Features2=0x1
>> >   XSAVE Features=0x1
>> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
>> >   TSC: P-state invariant, performance statistics
>> >
>> > I am don't see this feature before E5v3:
>> >
>> > CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU)
>> >   Origin="GenuineIntel"  Id=0x306e4  Family=0x6  Model=0x3e  Stepping=4
>> >   
>> > Features=0xbfebfbff
>> >   
>> > Features2=0x7fbee3ff
>> >   AMD Features=0x2c100800
>> >   AMD Features2=0x1
>> >   Structured Extended Features=0x281
>> >   XSAVE Features=0x1
>> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>> >   TSC: P-state invariant, performance statistics
>> >
>> > (don't run 11.0 on this CPU)
>> Ok.
>>
>> >
>> > CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (2600.05-MHz K8-class CPU)
>> >   Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
>> >   
>> > Features=0xbfebfbff
>> >   
>> > Features2=0x7ffefbff
>> >   AMD Features=0x2c100800
>> >   AMD Features2=0x21
>> >   Structured Extended 
>> > Features=0x37ab
>> >   XSAVE Features=0x1
>> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>> >   TSC: P-state invariant, performance statistics
>> >
>> > (11.0 run w/o this issuse)
>> Do you mean that similarly configured nginx+aio do not demonstrate the 
>> corruption on this machine ?
>
> Yes.
> But different storage configuration and different pattern load.
>
> Also 11.0 run w/o this issuse on
>
> CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (2200.04-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x406f1  Family=0x6  Model=0x4f  Stepping=1
>   
> Features=0xbfebfbff
>   
> Features2=0x7ffefbff
>   AMD Features=0x2c100800
>   AMD Features2=0x121
>   Structured Extended 
> Features=0x21cbfbb
>   XSAVE Features=0x1
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>   TSC: P-state invariant, performance statistics
>
> PS: all systems is dual-cpu.

D

Re: nginx and FreeBSD11

2016-09-20 Thread Slawa Olhovchenkov
On Wed, Sep 21, 2016 at 12:15:17AM +0300, Konstantin Belousov wrote:

> On Tue, Sep 20, 2016 at 11:38:54PM +0300, Slawa Olhovchenkov wrote:
> > On Tue, Sep 20, 2016 at 11:19:25PM +0300, Konstantin Belousov wrote:
> > 
> > > On Tue, Sep 20, 2016 at 10:20:53PM +0300, Slawa Olhovchenkov wrote:
> > > > On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa Olhovchenkov wrote:
> > > > 
> > > > > On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin wrote:
> > > > > 
> > > > > > > > If this panics, then vmspace_switch_aio() is not working for
> > > > > > > > some reason.
> > > > > > > 
> > > > > > > I am try using next DTrace script:
> > > > > > > 
> > > > > > > #pragma D option dynvarsize=64m
> > > > > > > 
> > > > > > > int req[struct vmspace  *, void *];
> > > > > > > self int trace;
> > > > > > > 
> > > > > > > syscall:freebsd:aio_read:entry
> > > > > > > {
> > > > > > > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct 
> > > > > > > aiocb));
> > > > > > > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
> > > > > > > curthread->td_proc->p_pid; 
> > > > > > > }
> > > > > > > 
> > > > > > > fbt:kernel:aio_process_rw:entry
> > > > > > > {
> > > > > > > self->job = args[0];
> > > > > > > self->trace = 1;
> > > > > > > }
> > > > > > > 
> > > > > > > fbt:kernel:aio_process_rw:return
> > > > > > > /self->trace/
> > > > > > > {
> > > > > > > req[self->job->userproc->p_vmspace, 
> > > > > > > self->job->uaiocb.aio_buf] = 0;
> > > > > > > self->job = 0;
> > > > > > > self->trace = 0;
> > > > > > > }
> > > > > > > 
> > > > > > > fbt:kernel:vn_io_fault:entry
> > > > > > > /self->trace && !req[curthread->td_proc->p_vmspace, 
> > > > > > > args[1]->uio_iov[0].iov_base]/
> > > > > > > {
> > > > > > > this->buf = args[1]->uio_iov[0].iov_base;
> > > > > > > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
> > > > > > > curthread->td_proc->p_vmspace, this->buf, 
> > > > > > > req[curthread->td_proc->p_vmspace, this->buf]);
> > > > > > > }
> > > > > > > ===
> > > > > > > 
> > > > > > > And don't got any messages near nginx core dump.
> > > > > > > What I can check next?
> > > > > > > May be check context/address space switch for kernel process?
> > > > > > 
> > > > > > Which CPU are you using?
> > > > > 
> > > > > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz K8-class 
> > > > > CPU)
> > > Is this sandy bridge ?
> > 
> > Sandy Bridge EP
> > 
> > > Show me first 100 lines of the verbose dmesg,
> > 
> > After day or two, after end of this test run -- I am need to enable verbose.
> > 
> > > I want to see cpu features lines.  In particular, does you CPU support
> > > the INVPCID feature.
> > 
> > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.05-MHz K8-class CPU)
> >   Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
> >   
> > Features=0xbfebfbff
> >   
> > Features2=0x1fbee3ff
> >   AMD Features=0x2c100800
> >   AMD Features2=0x1
> >   XSAVE Features=0x1
> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
> >   TSC: P-state invariant, performance statistics
> > 
> > I am don't see this feature before E5v3:
> > 
> > CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU)
> >   Origin="GenuineIntel"  Id=0x306e4  Family=0x6  Model=0x3e  Stepping=4
> >   
> > Features=0xbfebfbff
> >   
> > Features2=0x7fbee3ff
> >   AMD Features=0x2c100800
> >   AMD Features2=0x1
> >   Structured Extended Features=0x281
> >   XSAVE Features=0x1
> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
> >   TSC: P-state invariant, performance statistics
> > 
> > (don't run 11.0 on this CPU)
> Ok.
> 
> > 
> > CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (2600.05-MHz K8-class CPU)
> >   Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
> >   
> > Features=0xbfebfbff
> >   
> > Features2=0x7ffefbff
> >   AMD Features=0x2c100800
> >   AMD Features2=0x21
> >   Structured Extended 
> > Features=0x37ab
> >   XSAVE Features=0x1
> >   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
> >   TSC: P-state invariant, performance statistics
> > 
> > (11.0 run w/o this issuse)
> Do you mean that similarly configured nginx+aio do not demonstrate the 
> corruption on this machine ?

Yes.
But different storage configuration and different pattern load.

Also 11.0 run w/o this issuse on

CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (2200.04-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x406f1  Family=0x6  Model=0x4f  Stepping=1
  
Features=0xbfebfbff
  
Features2=0x7ffefbff
  AMD Features=0x2c100800
  AMD Features2=0x121
  Structured Extended 
Features=0x21cbfbb
  XSAVE Features=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics

PS: all systems is dual-cpu.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr

Re: nginx and FreeBSD11

2016-09-20 Thread Konstantin Belousov
On Tue, Sep 20, 2016 at 11:38:54PM +0300, Slawa Olhovchenkov wrote:
> On Tue, Sep 20, 2016 at 11:19:25PM +0300, Konstantin Belousov wrote:
> 
> > On Tue, Sep 20, 2016 at 10:20:53PM +0300, Slawa Olhovchenkov wrote:
> > > On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa Olhovchenkov wrote:
> > > 
> > > > On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin wrote:
> > > > 
> > > > > > > If this panics, then vmspace_switch_aio() is not working for
> > > > > > > some reason.
> > > > > > 
> > > > > > I am try using next DTrace script:
> > > > > > 
> > > > > > #pragma D option dynvarsize=64m
> > > > > > 
> > > > > > int req[struct vmspace  *, void *];
> > > > > > self int trace;
> > > > > > 
> > > > > > syscall:freebsd:aio_read:entry
> > > > > > {
> > > > > > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct 
> > > > > > aiocb));
> > > > > > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
> > > > > > curthread->td_proc->p_pid; 
> > > > > > }
> > > > > > 
> > > > > > fbt:kernel:aio_process_rw:entry
> > > > > > {
> > > > > > self->job = args[0];
> > > > > > self->trace = 1;
> > > > > > }
> > > > > > 
> > > > > > fbt:kernel:aio_process_rw:return
> > > > > > /self->trace/
> > > > > > {
> > > > > > req[self->job->userproc->p_vmspace, 
> > > > > > self->job->uaiocb.aio_buf] = 0;
> > > > > > self->job = 0;
> > > > > > self->trace = 0;
> > > > > > }
> > > > > > 
> > > > > > fbt:kernel:vn_io_fault:entry
> > > > > > /self->trace && !req[curthread->td_proc->p_vmspace, 
> > > > > > args[1]->uio_iov[0].iov_base]/
> > > > > > {
> > > > > > this->buf = args[1]->uio_iov[0].iov_base;
> > > > > > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
> > > > > > curthread->td_proc->p_vmspace, this->buf, 
> > > > > > req[curthread->td_proc->p_vmspace, this->buf]);
> > > > > > }
> > > > > > ===
> > > > > > 
> > > > > > And don't got any messages near nginx core dump.
> > > > > > What I can check next?
> > > > > > May be check context/address space switch for kernel process?
> > > > > 
> > > > > Which CPU are you using?
> > > > 
> > > > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz K8-class CPU)
> > Is this sandy bridge ?
> 
> Sandy Bridge EP
> 
> > Show me first 100 lines of the verbose dmesg,
> 
> After day or two, after end of this test run -- I am need to enable verbose.
> 
> > I want to see cpu features lines.  In particular, does you CPU support
> > the INVPCID feature.
> 
> CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.05-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
>   
> Features=0xbfebfbff
>   
> Features2=0x1fbee3ff
>   AMD Features=0x2c100800
>   AMD Features2=0x1
>   XSAVE Features=0x1
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
>   TSC: P-state invariant, performance statistics
> 
> I am don't see this feature before E5v3:
> 
> CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x306e4  Family=0x6  Model=0x3e  Stepping=4
>   
> Features=0xbfebfbff
>   
> Features2=0x7fbee3ff
>   AMD Features=0x2c100800
>   AMD Features2=0x1
>   Structured Extended Features=0x281
>   XSAVE Features=0x1
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>   TSC: P-state invariant, performance statistics
> 
> (don't run 11.0 on this CPU)
Ok.

> 
> CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (2600.05-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
>   
> Features=0xbfebfbff
>   
> Features2=0x7ffefbff
>   AMD Features=0x2c100800
>   AMD Features2=0x21
>   Structured Extended 
> Features=0x37ab
>   XSAVE Features=0x1
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>   TSC: P-state invariant, performance statistics
> 
> (11.0 run w/o this issuse)
Do you mean that similarly configured nginx+aio do not demonstrate the 
corruption on this machine ?

> 
> > Also you may show me the 'sysctl vm.pmap' output.
> 
> # sysctl vm.pmap
> vm.pmap.pdpe.demotions: 3
> vm.pmap.pde.promotions: 172495
> vm.pmap.pde.p_failures: 2119294
> vm.pmap.pde.mappings: 1927
> vm.pmap.pde.demotions: 126192
> vm.pmap.pcid_save_cnt: 0
> vm.pmap.invpcid_works: 0
> vm.pmap.pcid_enabled: 0
> vm.pmap.pg_ps_enabled: 1
> vm.pmap.pat_works: 1
> 
> This is after vm.pmap.pcid_enabled=0 in loader.conf
> 
> > > > 
> > > > > Perhaps try disabling PCID support (I think vm.pmap.pcid_enabled=0 
> > > > > from
> > > > > loader prompt or loader.conf)?  (Wondering if pmap_activate() is 
> > > > > somehow not switching)
> > > 
> > > I am need some more time to test (day or two), but now this is like
> > > workaround/solution: 12h runtime and peak hour w/o nginx crash.
> > > (vm.pmap.pcid_enabled=0 in loader.conf).
> > 
> > Please try this variation of the previous patch.
> 
> and remove vm.pmap.pcid_enabled=0?
Definitely.

> 
> > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> > index a23468e..f754652 100644
> > --- a/sys/vm/vm_map.c

Re: nginx and FreeBSD11

2016-09-20 Thread Slawa Olhovchenkov
On Tue, Sep 20, 2016 at 11:19:25PM +0300, Konstantin Belousov wrote:

> On Tue, Sep 20, 2016 at 10:20:53PM +0300, Slawa Olhovchenkov wrote:
> > On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa Olhovchenkov wrote:
> > 
> > > On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin wrote:
> > > 
> > > > > > If this panics, then vmspace_switch_aio() is not working for
> > > > > > some reason.
> > > > > 
> > > > > I am try using next DTrace script:
> > > > > 
> > > > > #pragma D option dynvarsize=64m
> > > > > 
> > > > > int req[struct vmspace  *, void *];
> > > > > self int trace;
> > > > > 
> > > > > syscall:freebsd:aio_read:entry
> > > > > {
> > > > > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct 
> > > > > aiocb));
> > > > > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
> > > > > curthread->td_proc->p_pid; 
> > > > > }
> > > > > 
> > > > > fbt:kernel:aio_process_rw:entry
> > > > > {
> > > > > self->job = args[0];
> > > > > self->trace = 1;
> > > > > }
> > > > > 
> > > > > fbt:kernel:aio_process_rw:return
> > > > > /self->trace/
> > > > > {
> > > > > req[self->job->userproc->p_vmspace, 
> > > > > self->job->uaiocb.aio_buf] = 0;
> > > > > self->job = 0;
> > > > > self->trace = 0;
> > > > > }
> > > > > 
> > > > > fbt:kernel:vn_io_fault:entry
> > > > > /self->trace && !req[curthread->td_proc->p_vmspace, 
> > > > > args[1]->uio_iov[0].iov_base]/
> > > > > {
> > > > > this->buf = args[1]->uio_iov[0].iov_base;
> > > > > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
> > > > > curthread->td_proc->p_vmspace, this->buf, 
> > > > > req[curthread->td_proc->p_vmspace, this->buf]);
> > > > > }
> > > > > ===
> > > > > 
> > > > > And don't got any messages near nginx core dump.
> > > > > What I can check next?
> > > > > May be check context/address space switch for kernel process?
> > > > 
> > > > Which CPU are you using?
> > > 
> > > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz K8-class CPU)
> Is this sandy bridge ?

Sandy Bridge EP

> Show me first 100 lines of the verbose dmesg,

After day or two, after end of this test run -- I am need to enable verbose.

> I want to see cpu features lines.  In particular, does you CPU support
> the INVPCID feature.

CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
  
Features=0xbfebfbff
  
Features2=0x1fbee3ff
  AMD Features=0x2c100800
  AMD Features2=0x1
  XSAVE Features=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics

I am don't see this feature before E5v3:

CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2600.06-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306e4  Family=0x6  Model=0x3e  Stepping=4
  
Features=0xbfebfbff
  
Features2=0x7fbee3ff
  AMD Features=0x2c100800
  AMD Features2=0x1
  Structured Extended Features=0x281
  XSAVE Features=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics

(don't run 11.0 on this CPU)

CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (2600.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
  
Features=0xbfebfbff
  
Features2=0x7ffefbff
  AMD Features=0x2c100800
  AMD Features2=0x21
  Structured Extended 
Features=0x37ab
  XSAVE Features=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics

(11.0 run w/o this issuse)

> Also you may show me the 'sysctl vm.pmap' output.

# sysctl vm.pmap
vm.pmap.pdpe.demotions: 3
vm.pmap.pde.promotions: 172495
vm.pmap.pde.p_failures: 2119294
vm.pmap.pde.mappings: 1927
vm.pmap.pde.demotions: 126192
vm.pmap.pcid_save_cnt: 0
vm.pmap.invpcid_works: 0
vm.pmap.pcid_enabled: 0
vm.pmap.pg_ps_enabled: 1
vm.pmap.pat_works: 1

This is after vm.pmap.pcid_enabled=0 in loader.conf

> > > 
> > > > Perhaps try disabling PCID support (I think vm.pmap.pcid_enabled=0 from
> > > > loader prompt or loader.conf)?  (Wondering if pmap_activate() is 
> > > > somehow not switching)
> > 
> > I am need some more time to test (day or two), but now this is like
> > workaround/solution: 12h runtime and peak hour w/o nginx crash.
> > (vm.pmap.pcid_enabled=0 in loader.conf).
> 
> Please try this variation of the previous patch.

and remove vm.pmap.pcid_enabled=0?

> diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> index a23468e..f754652 100644
> --- a/sys/vm/vm_map.c
> +++ b/sys/vm/vm_map.c
> @@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
>   if (oldvm == newvm)
>   return;
>  
> + spinlock_enter();
>   /*
>* Point to the new address space and refer to it.
>*/
> @@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
>  
>   /* Activate the new mapping. */
>   pmap_activate(curthread);
> + spinlock_exit();
>  
>   /* Remove the daemon's reference to the old address space. *

Re: nginx and FreeBSD11

2016-09-20 Thread Konstantin Belousov
On Tue, Sep 20, 2016 at 10:20:53PM +0300, Slawa Olhovchenkov wrote:
> On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa Olhovchenkov wrote:
> 
> > On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin wrote:
> > 
> > > > > If this panics, then vmspace_switch_aio() is not working for
> > > > > some reason.
> > > > 
> > > > I am try using next DTrace script:
> > > > 
> > > > #pragma D option dynvarsize=64m
> > > > 
> > > > int req[struct vmspace  *, void *];
> > > > self int trace;
> > > > 
> > > > syscall:freebsd:aio_read:entry
> > > > {
> > > > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb));
> > > > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
> > > > curthread->td_proc->p_pid; 
> > > > }
> > > > 
> > > > fbt:kernel:aio_process_rw:entry
> > > > {
> > > > self->job = args[0];
> > > > self->trace = 1;
> > > > }
> > > > 
> > > > fbt:kernel:aio_process_rw:return
> > > > /self->trace/
> > > > {
> > > > req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] 
> > > > = 0;
> > > > self->job = 0;
> > > > self->trace = 0;
> > > > }
> > > > 
> > > > fbt:kernel:vn_io_fault:entry
> > > > /self->trace && !req[curthread->td_proc->p_vmspace, 
> > > > args[1]->uio_iov[0].iov_base]/
> > > > {
> > > > this->buf = args[1]->uio_iov[0].iov_base;
> > > > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
> > > > curthread->td_proc->p_vmspace, this->buf, 
> > > > req[curthread->td_proc->p_vmspace, this->buf]);
> > > > }
> > > > ===
> > > > 
> > > > And don't got any messages near nginx core dump.
> > > > What I can check next?
> > > > May be check context/address space switch for kernel process?
> > > 
> > > Which CPU are you using?
> > 
> > CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz K8-class CPU)
Is this sandy bridge ?  Show me first 100 lines of the verbose dmesg,
I want to see cpu features lines.  In particular, does you CPU support
the INVPCID feature.

Also you may show me the 'sysctl vm.pmap' output.

> > 
> > > Perhaps try disabling PCID support (I think vm.pmap.pcid_enabled=0 from
> > > loader prompt or loader.conf)?  (Wondering if pmap_activate() is somehow 
> > > not switching)
> 
> I am need some more time to test (day or two), but now this is like
> workaround/solution: 12h runtime and peak hour w/o nginx crash.
> (vm.pmap.pcid_enabled=0 in loader.conf).

Please try this variation of the previous patch.

diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
index a23468e..f754652 100644
--- a/sys/vm/vm_map.c
+++ b/sys/vm/vm_map.c
@@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
if (oldvm == newvm)
return;
 
+   spinlock_enter();
/*
 * Point to the new address space and refer to it.
 */
@@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
 
/* Activate the new mapping. */
pmap_activate(curthread);
+   spinlock_exit();
 
/* Remove the daemon's reference to the old address space. */
KASSERT(oldvm->vm_refcnt > 1,
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-20 Thread Slawa Olhovchenkov
On Tue, Sep 20, 2016 at 09:52:44AM +0300, Slawa Olhovchenkov wrote:

> On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin wrote:
> 
> > > > If this panics, then vmspace_switch_aio() is not working for
> > > > some reason.
> > > 
> > > I am try using next DTrace script:
> > > 
> > > #pragma D option dynvarsize=64m
> > > 
> > > int req[struct vmspace  *, void *];
> > > self int trace;
> > > 
> > > syscall:freebsd:aio_read:entry
> > > {
> > > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb));
> > > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
> > > curthread->td_proc->p_pid; 
> > > }
> > > 
> > > fbt:kernel:aio_process_rw:entry
> > > {
> > > self->job = args[0];
> > > self->trace = 1;
> > > }
> > > 
> > > fbt:kernel:aio_process_rw:return
> > > /self->trace/
> > > {
> > > req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] = 
> > > 0;
> > > self->job = 0;
> > > self->trace = 0;
> > > }
> > > 
> > > fbt:kernel:vn_io_fault:entry
> > > /self->trace && !req[curthread->td_proc->p_vmspace, 
> > > args[1]->uio_iov[0].iov_base]/
> > > {
> > > this->buf = args[1]->uio_iov[0].iov_base;
> > > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
> > > curthread->td_proc->p_vmspace, this->buf, 
> > > req[curthread->td_proc->p_vmspace, this->buf]);
> > > }
> > > ===
> > > 
> > > And don't got any messages near nginx core dump.
> > > What I can check next?
> > > May be check context/address space switch for kernel process?
> > 
> > Which CPU are you using?
> 
> CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz K8-class CPU)
> 
> > Perhaps try disabling PCID support (I think vm.pmap.pcid_enabled=0 from
> > loader prompt or loader.conf)?  (Wondering if pmap_activate() is somehow 
> > not switching)

I am need some more time to test (day or two), but now this is like
workaround/solution: 12h runtime and peak hour w/o nginx crash.
(vm.pmap.pcid_enabled=0 in loader.conf).
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-19 Thread Slawa Olhovchenkov
On Mon, Sep 19, 2016 at 06:05:46PM -0700, John Baldwin wrote:

> > > If this panics, then vmspace_switch_aio() is not working for
> > > some reason.
> > 
> > I am try using next DTrace script:
> > 
> > #pragma D option dynvarsize=64m
> > 
> > int req[struct vmspace  *, void *];
> > self int trace;
> > 
> > syscall:freebsd:aio_read:entry
> > {
> > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb));
> > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
> > curthread->td_proc->p_pid; 
> > }
> > 
> > fbt:kernel:aio_process_rw:entry
> > {
> > self->job = args[0];
> > self->trace = 1;
> > }
> > 
> > fbt:kernel:aio_process_rw:return
> > /self->trace/
> > {
> > req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] = 0;
> > self->job = 0;
> > self->trace = 0;
> > }
> > 
> > fbt:kernel:vn_io_fault:entry
> > /self->trace && !req[curthread->td_proc->p_vmspace, 
> > args[1]->uio_iov[0].iov_base]/
> > {
> > this->buf = args[1]->uio_iov[0].iov_base;
> > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
> > curthread->td_proc->p_vmspace, this->buf, 
> > req[curthread->td_proc->p_vmspace, this->buf]);
> > }
> > ===
> > 
> > And don't got any messages near nginx core dump.
> > What I can check next?
> > May be check context/address space switch for kernel process?
> 
> Which CPU are you using?

CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2000.04-MHz K8-class CPU)

> Perhaps try disabling PCID support (I think vm.pmap.pcid_enabled=0 from
> loader prompt or loader.conf)?  (Wondering if pmap_activate() is somehow not 
> switching)
> 
> -- 
> John Baldwin
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-19 Thread John Baldwin
On Sunday, September 18, 2016 07:22:41 PM Slawa Olhovchenkov wrote:
> On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:
> 
> > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> > > 
> > > > I am have strange issuse with nginx on FreeBSD11.
> > > > I am have FreeBSD11 instaled over STABLE-10.
> > > > nginx build for FreeBSD10 and run w/o recompile work fine.
> > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > > > totaly craped.
> > > > 
> > > > I am see next potential cause:
> > > > 
> > > > 1) clang 3.8 code generation issuse
> > > > 2) system library issuse
> > > > 
> > > > may be i am miss something?
> > > > 
> > > > How to find real cause?
> > > 
> > > I find real cause and this like show-stopper for RELEASE.
> > > I am use nginx with AIO and AIO from one nginx process corrupt memory
> > > from other nginx process. Yes, this is cross-process memory
> > > corruption.
> > > 
> > > Last case, core dumped proccess with pid 1060 at 15:45:14.
> > > Corruped memory at 0x860697000.
> > > I am know about good memory at 0x86067f800.
> > > Dumping (form core) this region to file and analyze by hexdump I am
> > > found start of corrupt region -- offset c8c0 from 0x86067f800.
> > > 0x86067f800+0xc8c0 = 0x86068c0c0
> > > 
> > > I am preliminary enabled debuggin of AIO started operation to nginx
> > > error log (memory address, file name, offset and size of transfer).
> > > 
> > > grep -i 86068c0c0 error.log near 15:45:14 give target file.
> > > grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> > > 
> > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 
> > > start 00086068C0C0 561b0   2646736 ce949665cbcd.hls
> > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 
> > > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 
> > > start 00086472B7C0 7ff70   2999424 ce949665cbcd.hls
> > 
> > Does nginx only use AIO for regular files or does it also use it with 
> > sockets?
> > 
> > You can try using this patch as a diagnostic (you will need to
> > run with INVARIANTS enabled, or at least enabled for vfs_aio.c):
> > 
> > Index: vfs_aio.c
> > ===
> > --- vfs_aio.c   (revision 305811)
> > +++ vfs_aio.c   (working copy)
> > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
> >  * aio_aqueue() acquires a reference to the file that is
> >  * released in aio_free_entry().
> >  */
> > +   KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > +   ("%s: vmspace mismatch", __func__));
> > if (cb->aio_lio_opcode == LIO_READ) {
> > auio.uio_rw = UIO_READ;
> > if (auio.uio_resid == 0)
> > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
> >  {
> >  
> > vmspace_switch_aio(job->userproc->p_vmspace);
> > +   KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > +   ("%s: vmspace mismatch", __func__));
> >  }
> > 
> > If this panics, then vmspace_switch_aio() is not working for
> > some reason.
> 
> I am try using next DTrace script:
> 
> #pragma D option dynvarsize=64m
> 
> int req[struct vmspace  *, void *];
> self int trace;
> 
> syscall:freebsd:aio_read:entry
> {
> this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb));
> req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
> curthread->td_proc->p_pid; 
> }
> 
> fbt:kernel:aio_process_rw:entry
> {
> self->job = args[0];
> self->trace = 1;
> }
> 
> fbt:kernel:aio_process_rw:return
> /self->trace/
> {
> req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] = 0;
> self->job = 0;
> self->trace = 0;
> }
> 
> fbt:kernel:vn_io_fault:entry
> /self->trace && !req[curthread->td_proc->p_vmspace, 
> args[1]->uio_iov[0].iov_base]/
> {
> this->buf = args[1]->uio_iov[0].iov_base;
> printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
> curthread->td_proc->p_vmspace, this->buf, req[curthread->td_proc->p_vmspace, 
> this->buf]);
> }
> ===
> 
> And don't got any messages near nginx core dump.
> What I can check next?
> May be check context/address space switch for kernel process?

Which CPU are you using?  Perhaps try disabling PCID support (I think 
vm.pmap.pcid_enabled=0 from
loader prompt or loader.conf)?  (Wondering if pmap_activate() is somehow not 
switching)

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-18 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:

> On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> > 
> > > I am have strange issuse with nginx on FreeBSD11.
> > > I am have FreeBSD11 instaled over STABLE-10.
> > > nginx build for FreeBSD10 and run w/o recompile work fine.
> > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > > totaly craped.
> > > 
> > > I am see next potential cause:
> > > 
> > > 1) clang 3.8 code generation issuse
> > > 2) system library issuse
> > > 
> > > may be i am miss something?
> > > 
> > > How to find real cause?
> > 
> > I find real cause and this like show-stopper for RELEASE.
> > I am use nginx with AIO and AIO from one nginx process corrupt memory
> > from other nginx process. Yes, this is cross-process memory
> > corruption.
> > 
> > Last case, core dumped proccess with pid 1060 at 15:45:14.
> > Corruped memory at 0x860697000.
> > I am know about good memory at 0x86067f800.
> > Dumping (form core) this region to file and analyze by hexdump I am
> > found start of corrupt region -- offset c8c0 from 0x86067f800.
> > 0x86067f800+0xc8c0 = 0x86068c0c0
> > 
> > I am preliminary enabled debuggin of AIO started operation to nginx
> > error log (memory address, file name, offset and size of transfer).
> > 
> > grep -i 86068c0c0 error.log near 15:45:14 give target file.
> > grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> > 
> > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 
> > start 00086068C0C0 561b0   2646736 ce949665cbcd.hls
> > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 
> > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 
> > start 00086472B7C0 7ff70   2999424 ce949665cbcd.hls
> 
> Does nginx only use AIO for regular files or does it also use it with sockets?
> 
> You can try using this patch as a diagnostic (you will need to
> run with INVARIANTS enabled, or at least enabled for vfs_aio.c):
> 
> Index: vfs_aio.c
> ===
> --- vfs_aio.c (revision 305811)
> +++ vfs_aio.c (working copy)
> @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
>* aio_aqueue() acquires a reference to the file that is
>* released in aio_free_entry().
>*/
> + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> + ("%s: vmspace mismatch", __func__));
>   if (cb->aio_lio_opcode == LIO_READ) {
>   auio.uio_rw = UIO_READ;
>   if (auio.uio_resid == 0)
> @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
>  {
>  
>   vmspace_switch_aio(job->userproc->p_vmspace);
> + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> + ("%s: vmspace mismatch", __func__));
>  }
> 
> If this panics, then vmspace_switch_aio() is not working for
> some reason.

I am try using next DTrace script:

#pragma D option dynvarsize=64m

int req[struct vmspace  *, void *];
self int trace;

syscall:freebsd:aio_read:entry
{
this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb));
req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = 
curthread->td_proc->p_pid; 
}

fbt:kernel:aio_process_rw:entry
{
self->job = args[0];
self->trace = 1;
}

fbt:kernel:aio_process_rw:return
/self->trace/
{
req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] = 0;
self->job = 0;
self->trace = 0;
}

fbt:kernel:vn_io_fault:entry
/self->trace && !req[curthread->td_proc->p_vmspace, 
args[1]->uio_iov[0].iov_base]/
{
this->buf = args[1]->uio_iov[0].iov_base;
printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, 
curthread->td_proc->p_vmspace, this->buf, req[curthread->td_proc->p_vmspace, 
this->buf]);
}
===

And don't got any messages near nginx core dump.
What I can check next?
May be check context/address space switch for kernel process?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-16 Thread Slawa Olhovchenkov
On Fri, Sep 16, 2016 at 01:17:14PM +0300, Slawa Olhovchenkov wrote:

> On Fri, Sep 16, 2016 at 12:16:17PM +0300, Konstantin Belousov wrote:
> 
> > 
> > vmspace_switch_aio() allows context switching with old curpmap
> > and new proc->p_vmspace. This is a weird condition, where
> > curproc->p_vmspace->vm_pmap is not equal to curcpu->pc_curpmap. I do
> > not see an obvious place which would immediately break, e.g. even
> > for context switch between assignment of newvm to p_vmspace and
> > pmap_activate(), the context-switch call to pmap_activate_sw() seems to
> > do right thing.
> > 
> > Still, just in case, try this
> > 
> > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> > index a23468e..fbaa6c1 100644
> > --- a/sys/vm/vm_map.c
> > +++ b/sys/vm/vm_map.c
> > @@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
> > if (oldvm == newvm)
> > return;
> >  
> > +   critical_enter();
> > /*
> >  * Point to the new address space and refer to it.
> >  */
> > @@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
> >  
> > /* Activate the new mapping. */
> > pmap_activate(curthread);
> > +   critical_exit();
> >  
> > /* Remove the daemon's reference to the old address space. */
> > KASSERT(oldvm->vm_refcnt > 1,
> 
> OK, nginx core dumped, kernel don't crushed.
> Now I am try this patch (critical_enter) and reboot.

nginx still core dumped.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-16 Thread Slawa Olhovchenkov
On Fri, Sep 16, 2016 at 12:16:17PM +0300, Konstantin Belousov wrote:

> 
> vmspace_switch_aio() allows context switching with old curpmap
> and new proc->p_vmspace. This is a weird condition, where
> curproc->p_vmspace->vm_pmap is not equal to curcpu->pc_curpmap. I do
> not see an obvious place which would immediately break, e.g. even
> for context switch between assignment of newvm to p_vmspace and
> pmap_activate(), the context-switch call to pmap_activate_sw() seems to
> do right thing.
> 
> Still, just in case, try this
> 
> diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> index a23468e..fbaa6c1 100644
> --- a/sys/vm/vm_map.c
> +++ b/sys/vm/vm_map.c
> @@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
>   if (oldvm == newvm)
>   return;
>  
> + critical_enter();
>   /*
>* Point to the new address space and refer to it.
>*/
> @@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
>  
>   /* Activate the new mapping. */
>   pmap_activate(curthread);
> + critical_exit();
>  
>   /* Remove the daemon's reference to the old address space. */
>   KASSERT(oldvm->vm_refcnt > 1,

OK, nginx core dumped, kernel don't crushed.
Now I am try this patch (critical_enter) and reboot.

PS: vi regresion: can't exit from vi when no space on /tmp
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-16 Thread Konstantin Belousov
On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote:
> On Thursday, September 15, 2016 08:49:48 PM Slawa Olhovchenkov wrote:
> > On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:
> > 
> > > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> > > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> > > > 
> > > > > I am have strange issuse with nginx on FreeBSD11.
> > > > > I am have FreeBSD11 instaled over STABLE-10.
> > > > > nginx build for FreeBSD10 and run w/o recompile work fine.
> > > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > > > > totaly craped.
> > > > > 
> > > > > I am see next potential cause:
> > > > > 
> > > > > 1) clang 3.8 code generation issuse
> > > > > 2) system library issuse
> > > > > 
> > > > > may be i am miss something?
> > > > > 
> > > > > How to find real cause?
> > > > 
> > > > I find real cause and this like show-stopper for RELEASE.
> > > > I am use nginx with AIO and AIO from one nginx process corrupt memory
> > > > from other nginx process. Yes, this is cross-process memory
> > > > corruption.
> > > > 
> > > > Last case, core dumped proccess with pid 1060 at 15:45:14.
> > > > Corruped memory at 0x860697000.
> > > > I am know about good memory at 0x86067f800.
> > > > Dumping (form core) this region to file and analyze by hexdump I am
> > > > found start of corrupt region -- offset c8c0 from 0x86067f800.
> > > > 0x86067f800+0xc8c0 = 0x86068c0c0
> > > > 
> > > > I am preliminary enabled debuggin of AIO started operation to nginx
> > > > error log (memory address, file name, offset and size of transfer).
> > > > 
> > > > grep -i 86068c0c0 error.log near 15:45:14 give target file.
> > > > grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> > > > 
> > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 
> > > > start 00086068C0C0 561b0   2646736 ce949665cbcd.hls
> > > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 
> > > > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> > > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 
> > > > start 00086472B7C0 7ff70   2999424 ce949665cbcd.hls
> > > 
> > > Does nginx only use AIO for regular files or does it also use it with 
> > > sockets?
> > 
> > Only for regular files.
> > 
> > > You can try using this patch as a diagnostic (you will need to
> > > run with INVARIANTS enabled,
> > 
> > How much debugs produced?
> > I am have about 5-10K aio's per second.
> > 
> > > or at least enabled for vfs_aio.c):
> > 
> > How I can do this (enable INVARIANTS for vfs_aio.c)?
> 
> Include INVARIANT_SUPPORT in your kernel and add a line with:
> 
> #define INVARIANTS
> 
> at the top of sys/kern/vfs_aio.c.
> 
> > 
> > > Index: vfs_aio.c
> > > ===
> > > --- vfs_aio.c (revision 305811)
> > > +++ vfs_aio.c (working copy)
> > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
> > >* aio_aqueue() acquires a reference to the file that is
> > >* released in aio_free_entry().
> > >*/
> > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > > + ("%s: vmspace mismatch", __func__));
> > >   if (cb->aio_lio_opcode == LIO_READ) {
> > >   auio.uio_rw = UIO_READ;
> > >   if (auio.uio_resid == 0)
> > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
> > >  {
> > >  
> > >   vmspace_switch_aio(job->userproc->p_vmspace);
> > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > > + ("%s: vmspace mismatch", __func__));
> > >  }
> > > 
> > > If this panics, then vmspace_switch_aio() is not working for
> > > some reason.
> > 
> > This issuse caused rare, this panic produced with issuse or on any aio
> > request? (this is production server)
> 
> It would panic in the case that we are going to write into the wrong
> process (so about as rare as your issue).
> 

vmspace_switch_aio() allows context switching with old curpmap
and new proc->p_vmspace. This is a weird condition, where
curproc->p_vmspace->vm_pmap is not equal to curcpu->pc_curpmap. I do
not see an obvious place which would immediately break, e.g. even
for context switch between assignment of newvm to p_vmspace and
pmap_activate(), the context-switch call to pmap_activate_sw() seems to
do right thing.

Still, just in case, try this

diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
index a23468e..fbaa6c1 100644
--- a/sys/vm/vm_map.c
+++ b/sys/vm/vm_map.c
@@ -481,6 +481,7 @@ vmspace_switch_aio(struct vmspace *newvm)
if (oldvm == newvm)
return;
 
+   critical_enter();
/*
 * Point to the new address space and refer to it.
 */
@@ -489,6 +490,7 @@ vmspace_switch_aio(struct vmspace *newvm)
 
/* Activate the new mapping. */
pmap_activate(curthread);
+   critical_exit();
 
/* Remove the daemon's reference to the

Re: nginx and FreeBSD11

2016-09-15 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote:

> On Thursday, September 15, 2016 08:49:48 PM Slawa Olhovchenkov wrote:
> > On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:
> > 
> > > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> > > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> > > > 
> > > > > I am have strange issuse with nginx on FreeBSD11.
> > > > > I am have FreeBSD11 instaled over STABLE-10.
> > > > > nginx build for FreeBSD10 and run w/o recompile work fine.
> > > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > > > > totaly craped.
> > > > > 
> > > > > I am see next potential cause:
> > > > > 
> > > > > 1) clang 3.8 code generation issuse
> > > > > 2) system library issuse
> > > > > 
> > > > > may be i am miss something?
> > > > > 
> > > > > How to find real cause?
> > > > 
> > > > I find real cause and this like show-stopper for RELEASE.
> > > > I am use nginx with AIO and AIO from one nginx process corrupt memory
> > > > from other nginx process. Yes, this is cross-process memory
> > > > corruption.
> > > > 
> > > > Last case, core dumped proccess with pid 1060 at 15:45:14.
> > > > Corruped memory at 0x860697000.
> > > > I am know about good memory at 0x86067f800.
> > > > Dumping (form core) this region to file and analyze by hexdump I am
> > > > found start of corrupt region -- offset c8c0 from 0x86067f800.
> > > > 0x86067f800+0xc8c0 = 0x86068c0c0
> > > > 
> > > > I am preliminary enabled debuggin of AIO started operation to nginx
> > > > error log (memory address, file name, offset and size of transfer).
> > > > 
> > > > grep -i 86068c0c0 error.log near 15:45:14 give target file.
> > > > grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> > > > 
> > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 
> > > > start 00086068C0C0 561b0   2646736 ce949665cbcd.hls
> > > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 
> > > > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> > > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 
> > > > start 00086472B7C0 7ff70   2999424 ce949665cbcd.hls
> > > 
> > > Does nginx only use AIO for regular files or does it also use it with 
> > > sockets?
> > 
> > Only for regular files.
> > 
> > > You can try using this patch as a diagnostic (you will need to
> > > run with INVARIANTS enabled,
> > 
> > How much debugs produced?
> > I am have about 5-10K aio's per second.
> > 
> > > or at least enabled for vfs_aio.c):
> > 
> > How I can do this (enable INVARIANTS for vfs_aio.c)?
> 
> Include INVARIANT_SUPPORT in your kernel and add a line with:
> 
> #define INVARIANTS
> 
> at the top of sys/kern/vfs_aio.c.

# sysctl -a | grep -i inva
kern.timecounter.invariant_tsc: 1
kern.features.invariant_support: 1
options INVARIANT_SUPPORT

but no string `vmspace mismatch` in kernel.
May be I am miss some?

#define INVARIANTS

#include 
__FBSDID("$FreeBSD: stable/11/sys/kern/vfs_aio.c 304738 2016-08-24
09:18:38Z kib $");

#include "opt_compat.h"

#include 
#include 
#include 
#include 
#include 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread John Baldwin
On Thursday, September 15, 2016 10:09:48 PM Slawa Olhovchenkov wrote:
> On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote:
> 
> > > > Index: vfs_aio.c
> > > > ===
> > > > --- vfs_aio.c   (revision 305811)
> > > > +++ vfs_aio.c   (working copy)
> > > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
> > > >  * aio_aqueue() acquires a reference to the file that is
> > > >  * released in aio_free_entry().
> > > >  */
> > > > +   KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > > > +   ("%s: vmspace mismatch", __func__));
> > > > if (cb->aio_lio_opcode == LIO_READ) {
> > > > auio.uio_rw = UIO_READ;
> > > > if (auio.uio_resid == 0)
> > > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
> > > >  {
> > > >  
> > > > vmspace_switch_aio(job->userproc->p_vmspace);
> > > > +   KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > > > +   ("%s: vmspace mismatch", __func__));
> > > >  }
> > > > 
> > > > If this panics, then vmspace_switch_aio() is not working for
> > > > some reason.
> > > 
> > > This issuse caused rare, this panic produced with issuse or on any aio
> > > request? (this is production server)
> > 
> > It would panic in the case that we are going to write into the wrong
> > process (so about as rare as your issue).
> 
> Can I configure automatic reboot (not halted) in this case?

FreeBSD in a stable branch should already reboot (after writing out a dump)
by default unless you have configured it otherwise.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread Eugene Grosbein

It would panic in the case that we are going to write into the wrong
process (so about as rare as your issue).


Can I configure automatic reboot (not halted) in this case?


options KDB_UNATTENDED

configures kernel for automatic reboot after panic.



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote:

> > > Index: vfs_aio.c
> > > ===
> > > --- vfs_aio.c (revision 305811)
> > > +++ vfs_aio.c (working copy)
> > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
> > >* aio_aqueue() acquires a reference to the file that is
> > >* released in aio_free_entry().
> > >*/
> > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > > + ("%s: vmspace mismatch", __func__));
> > >   if (cb->aio_lio_opcode == LIO_READ) {
> > >   auio.uio_rw = UIO_READ;
> > >   if (auio.uio_resid == 0)
> > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
> > >  {
> > >  
> > >   vmspace_switch_aio(job->userproc->p_vmspace);
> > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > > + ("%s: vmspace mismatch", __func__));
> > >  }
> > > 
> > > If this panics, then vmspace_switch_aio() is not working for
> > > some reason.
> > 
> > This issuse caused rare, this panic produced with issuse or on any aio
> > request? (this is production server)
> 
> It would panic in the case that we are going to write into the wrong
> process (so about as rare as your issue).

Can I configure automatic reboot (not halted) in this case?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread John Baldwin
On Thursday, September 15, 2016 08:49:48 PM Slawa Olhovchenkov wrote:
> On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:
> 
> > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> > > 
> > > > I am have strange issuse with nginx on FreeBSD11.
> > > > I am have FreeBSD11 instaled over STABLE-10.
> > > > nginx build for FreeBSD10 and run w/o recompile work fine.
> > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > > > totaly craped.
> > > > 
> > > > I am see next potential cause:
> > > > 
> > > > 1) clang 3.8 code generation issuse
> > > > 2) system library issuse
> > > > 
> > > > may be i am miss something?
> > > > 
> > > > How to find real cause?
> > > 
> > > I find real cause and this like show-stopper for RELEASE.
> > > I am use nginx with AIO and AIO from one nginx process corrupt memory
> > > from other nginx process. Yes, this is cross-process memory
> > > corruption.
> > > 
> > > Last case, core dumped proccess with pid 1060 at 15:45:14.
> > > Corruped memory at 0x860697000.
> > > I am know about good memory at 0x86067f800.
> > > Dumping (form core) this region to file and analyze by hexdump I am
> > > found start of corrupt region -- offset c8c0 from 0x86067f800.
> > > 0x86067f800+0xc8c0 = 0x86068c0c0
> > > 
> > > I am preliminary enabled debuggin of AIO started operation to nginx
> > > error log (memory address, file name, offset and size of transfer).
> > > 
> > > grep -i 86068c0c0 error.log near 15:45:14 give target file.
> > > grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> > > 
> > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 
> > > start 00086068C0C0 561b0   2646736 ce949665cbcd.hls
> > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 
> > > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 
> > > start 00086472B7C0 7ff70   2999424 ce949665cbcd.hls
> > 
> > Does nginx only use AIO for regular files or does it also use it with 
> > sockets?
> 
> Only for regular files.
> 
> > You can try using this patch as a diagnostic (you will need to
> > run with INVARIANTS enabled,
> 
> How much debugs produced?
> I am have about 5-10K aio's per second.
> 
> > or at least enabled for vfs_aio.c):
> 
> How I can do this (enable INVARIANTS for vfs_aio.c)?

Include INVARIANT_SUPPORT in your kernel and add a line with:

#define INVARIANTS

at the top of sys/kern/vfs_aio.c.

> 
> > Index: vfs_aio.c
> > ===
> > --- vfs_aio.c   (revision 305811)
> > +++ vfs_aio.c   (working copy)
> > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
> >  * aio_aqueue() acquires a reference to the file that is
> >  * released in aio_free_entry().
> >  */
> > +   KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > +   ("%s: vmspace mismatch", __func__));
> > if (cb->aio_lio_opcode == LIO_READ) {
> > auio.uio_rw = UIO_READ;
> > if (auio.uio_resid == 0)
> > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
> >  {
> >  
> > vmspace_switch_aio(job->userproc->p_vmspace);
> > +   KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> > +   ("%s: vmspace mismatch", __func__));
> >  }
> > 
> > If this panics, then vmspace_switch_aio() is not working for
> > some reason.
> 
> This issuse caused rare, this panic produced with issuse or on any aio
> request? (this is production server)

It would panic in the case that we are going to write into the wrong
process (so about as rare as your issue).

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote:

> On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> > 
> > > I am have strange issuse with nginx on FreeBSD11.
> > > I am have FreeBSD11 instaled over STABLE-10.
> > > nginx build for FreeBSD10 and run w/o recompile work fine.
> > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > > totaly craped.
> > > 
> > > I am see next potential cause:
> > > 
> > > 1) clang 3.8 code generation issuse
> > > 2) system library issuse
> > > 
> > > may be i am miss something?
> > > 
> > > How to find real cause?
> > 
> > I find real cause and this like show-stopper for RELEASE.
> > I am use nginx with AIO and AIO from one nginx process corrupt memory
> > from other nginx process. Yes, this is cross-process memory
> > corruption.
> > 
> > Last case, core dumped proccess with pid 1060 at 15:45:14.
> > Corruped memory at 0x860697000.
> > I am know about good memory at 0x86067f800.
> > Dumping (form core) this region to file and analyze by hexdump I am
> > found start of corrupt region -- offset c8c0 from 0x86067f800.
> > 0x86067f800+0xc8c0 = 0x86068c0c0
> > 
> > I am preliminary enabled debuggin of AIO started operation to nginx
> > error log (memory address, file name, offset and size of transfer).
> > 
> > grep -i 86068c0c0 error.log near 15:45:14 give target file.
> > grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> > 
> > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 
> > start 00086068C0C0 561b0   2646736 ce949665cbcd.hls
> > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 
> > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 
> > start 00086472B7C0 7ff70   2999424 ce949665cbcd.hls
> 
> Does nginx only use AIO for regular files or does it also use it with sockets?

Only for regular files.

> You can try using this patch as a diagnostic (you will need to
> run with INVARIANTS enabled,

How much debugs produced?
I am have about 5-10K aio's per second.

> or at least enabled for vfs_aio.c):

How I can do this (enable INVARIANTS for vfs_aio.c)?

> Index: vfs_aio.c
> ===
> --- vfs_aio.c (revision 305811)
> +++ vfs_aio.c (working copy)
> @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
>* aio_aqueue() acquires a reference to the file that is
>* released in aio_free_entry().
>*/
> + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> + ("%s: vmspace mismatch", __func__));
>   if (cb->aio_lio_opcode == LIO_READ) {
>   auio.uio_rw = UIO_READ;
>   if (auio.uio_resid == 0)
> @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
>  {
>  
>   vmspace_switch_aio(job->userproc->p_vmspace);
> + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
> + ("%s: vmspace mismatch", __func__));
>  }
> 
> If this panics, then vmspace_switch_aio() is not working for
> some reason.

This issuse caused rare, this panic produced with issuse or on any aio
request? (this is production server)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread John Baldwin
On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote:
> On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:
> 
> > I am have strange issuse with nginx on FreeBSD11.
> > I am have FreeBSD11 instaled over STABLE-10.
> > nginx build for FreeBSD10 and run w/o recompile work fine.
> > nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> > totaly craped.
> > 
> > I am see next potential cause:
> > 
> > 1) clang 3.8 code generation issuse
> > 2) system library issuse
> > 
> > may be i am miss something?
> > 
> > How to find real cause?
> 
> I find real cause and this like show-stopper for RELEASE.
> I am use nginx with AIO and AIO from one nginx process corrupt memory
> from other nginx process. Yes, this is cross-process memory
> corruption.
> 
> Last case, core dumped proccess with pid 1060 at 15:45:14.
> Corruped memory at 0x860697000.
> I am know about good memory at 0x86067f800.
> Dumping (form core) this region to file and analyze by hexdump I am
> found start of corrupt region -- offset c8c0 from 0x86067f800.
> 0x86067f800+0xc8c0 = 0x86068c0c0
> 
> I am preliminary enabled debuggin of AIO started operation to nginx
> error log (memory address, file name, offset and size of transfer).
> 
> grep -i 86068c0c0 error.log near 15:45:14 give target file.
> grep ce949665cbcd.hls error.log near 15:45:14 give next result:
> 
> 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 start 
> 00086068C0C0 561b0   2646736 ce949665cbcd.hls
> 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 start 
> 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
> 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 start 
> 00086472B7C0 7ff70   2999424 ce949665cbcd.hls

Does nginx only use AIO for regular files or does it also use it with sockets?

You can try using this patch as a diagnostic (you will need to
run with INVARIANTS enabled, or at least enabled for vfs_aio.c):

Index: vfs_aio.c
===
--- vfs_aio.c   (revision 305811)
+++ vfs_aio.c   (working copy)
@@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job)
 * aio_aqueue() acquires a reference to the file that is
 * released in aio_free_entry().
 */
+   KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
+   ("%s: vmspace mismatch", __func__));
if (cb->aio_lio_opcode == LIO_READ) {
auio.uio_rw = UIO_READ;
if (auio.uio_resid == 0)
@@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job)
 {
 
vmspace_switch_aio(job->userproc->p_vmspace);
+   KASSERT(curproc->p_vmspace == job->userproc->p_vmspace,
+   ("%s: vmspace mismatch", __func__));
 }

If this panics, then vmspace_switch_aio() is not working for
some reason.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread Slawa Olhovchenkov
On Thu, Sep 15, 2016 at 09:52:30AM -0700, jungle Boogie wrote:

> On 15 September 2016 at 07:41, Slawa Olhovchenkov  wrote:
> > Bingo!
> > aio read file by process 1055 placed to same memory address as requested 
> > but in memory space of process 1060!
> >
> > This is kernel bug and this bug must be stoped release.
> 
> When was it introduced?

I am try 11.x as r305117 and don't try other releases.
Don't see in 10.x line.
aio in 11.x was refactored.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread jungle Boogie
On 15 September 2016 at 07:41, Slawa Olhovchenkov  wrote:
> Bingo!
> aio read file by process 1055 placed to same memory address as requested but 
> in memory space of process 1060!
>
> This is kernel bug and this bug must be stoped release.

When was it introduced?


-- 
---
inum: 883510009027723
sip: jungleboo...@sip2sip.info
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-15 Thread Slawa Olhovchenkov
On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote:

> I am have strange issuse with nginx on FreeBSD11.
> I am have FreeBSD11 instaled over STABLE-10.
> nginx build for FreeBSD10 and run w/o recompile work fine.
> nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> totaly craped.
> 
> I am see next potential cause:
> 
> 1) clang 3.8 code generation issuse
> 2) system library issuse
> 
> may be i am miss something?
> 
> How to find real cause?

I find real cause and this like show-stopper for RELEASE.
I am use nginx with AIO and AIO from one nginx process corrupt memory
from other nginx process. Yes, this is cross-process memory
corruption.

Last case, core dumped proccess with pid 1060 at 15:45:14.
Corruped memory at 0x860697000.
I am know about good memory at 0x86067f800.
Dumping (form core) this region to file and analyze by hexdump I am
found start of corrupt region -- offset c8c0 from 0x86067f800.
0x86067f800+0xc8c0 = 0x86068c0c0

I am preliminary enabled debuggin of AIO started operation to nginx
error log (memory address, file name, offset and size of transfer).

grep -i 86068c0c0 error.log near 15:45:14 give target file.
grep ce949665cbcd.hls error.log near 15:45:14 give next result:

2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 start 
00086068C0C0 561b0   2646736 ce949665cbcd.hls
2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 start 
00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls
2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 start 
00086472B7C0 7ff70   2999424 ce949665cbcd.hls

0x860697000-0x86068c0c0 = 0xaf40

from memory dump:
af00  5c 81 4d 7c 0b b6 81 f2  c8 a5 df 94 08 43 c1 08  |\.M|.C..|
af10  74 00 57 55 5f 15 11 b1  00 d5 29 6a 4e d2 fd fb  |t.WU_.)jN...|
af20  49 d1 fd 98 49 58 b7 66  c2 c9 64 67 30 05 06 c0  |I...IX.f..dg0...|
af30  0e b2 64 fa b7 9f 69 69  fc cd 91 82 83 ba c3 f2  |..d...ii|
af40  b7 34 eb 8e 0e 88 40 60  1b a8 71 7a 12 15 26 d3  |.4@`..qz..&.|
af50  7f 3e 80 e9 74 96 30 24  cb 82 88 8a ea e0 45 10  |.>..t.0$..E.|
af60  e5 75 b2 f7 5b 7c 83 fa  95 a9 09 80 0a 8c fd a9  |.u..[|..|
af70  ef 30 f6 68 9c b2 3f ae  2e e5 21 79 78 8b 34 36  |.0.h..?...!yx.46|
af80  c6 55 16 a2 47 00 ca 13  9c 8e 2c 6b eb c7 4f 51  |.U..G.,k..OQ|
af90  81 80 71 f3 a5 9a 5f 40  54 9c f1 f9 ba 81 b2 82  |..q..._@T...|

from disk file (offset from 2646736):
af00  5c 81 4d 7c 0b b6 81 f2  c8 a5 df 94 08 43 c1 08  |\.M|.C..|
af10  74 00 57 55 5f 15 11 b1  00 d5 29 6a 4e d2 fd fb  |t.WU_.)jN...|
af20  49 d1 fd 98 49 58 b7 66  c2 c9 64 67 30 05 06 c0  |I...IX.f..dg0...|
af30  0e b2 64 fa b7 9f 69 69  fc cd 91 82 83 ba c3 f2  |..d...ii|
af40  b7 34 eb 8e 0e 88 40 60  1b a8 71 7a 12 15 26 d3  |.4@`..qz..&.|
af50  7f 3e 80 e9 74 96 30 24  cb 82 88 8a ea e0 45 10  |.>..t.0$..E.|
af60  e5 75 b2 f7 5b 7c 83 fa  95 a9 09 80 0a 8c fd a9  |.u..[|..|
af70  ef 30 f6 68 9c b2 3f ae  2e e5 21 79 78 8b 34 36  |.0.h..?...!yx.46|
af80  c6 55 16 a2 47 00 ca 13  9c 8e 2c 6b eb c7 4f 51  |.U..G.,k..OQ|
af90  81 80 71 f3 a5 9a 5f 40  54 9c f1 f9 ba 81 b2 82  |..q..._@T...|

Bingo!
aio read file by process 1055 placed to same memory address as requested but in 
memory space of process 1060!

This is kernel bug and this bug must be stoped release.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nginx and FreeBSD11

2016-09-07 Thread Slawa Olhovchenkov
On Wed, Sep 07, 2016 at 08:29:04PM +0100, Matt Smith wrote:

> On Sep 07 22:13, Slawa Olhovchenkov wrote:
> >I am have strange issuse with nginx on FreeBSD11.
> >I am have FreeBSD11 instaled over STABLE-10.
> >nginx build for FreeBSD10 and run w/o recompile work fine.

sorry. also crashed.

> >nginx build for FreeBSD11 crushed inside rbtree lookups: next node
> >totaly craped.
> >
> >I am see next potential cause:
> >
> >1) clang 3.8 code generation issuse
> >2) system library issuse
> >
> >may be i am miss something?
> >
> >How to find real cause?
> 
> I am running stable/11 on amd64 and using the nginx-devel port with 
> libressl-devel and I have no issues at all. It works fine. Have you 
> rebuilt *all* ports since updating from stable/10?

I am reinstall all from local poudriere.
This is crash is not frequenced, need about 10M request before crash.

#0  ngx_open_file_lookup (cache=0x80321ae08, name=0x7fffdf50, 
hash=4059963935) at src/core/ngx_open_file_cache.c:1199
1199if (hash < node->key) {
[New Thread 802218000 (LWP 102190/)]
(gdb) bt
#0  ngx_open_file_lookup (cache=0x80321ae08, name=0x7fffdf50, 
hash=4059963935) at src/core/ngx_open_file_cache.c:1199
#1  0x004483c7 in ngx_open_cached_file (cache=0x80321ae08, 
name=0x7fffdf50, of=0x7fffded8, pool=0x8109bd000) at 
src/core/ngx_open_file_cache.c:209
#2  0x0051aa52 in ngx_http_unite_handler (r=0x8109bd050) at 
/wrkdirs/usr/ports/www/nginx-devel/work/ngx_unite-1.2/ngx_unite_module.c:147
#3  0x00474b77 in ngx_http_core_content_phase (r=0x8109bd050, 
ph=0x80325c7b0) at src/http/ngx_http_core_module.c:1375
#4  0x0047208f in ngx_http_core_run_phases (r=0x8109bd050) at 
src/http/ngx_http_core_module.c:845
#5  0x004744aa in ngx_http_named_location (r=0x8109bd050, 
name=0x810a7e4e8) at src/http/ngx_http_core_module.c:2676
#6  0x00080263f353 in ngx_http_lua_handle_exec (L=0x1378, r=0x8109bd050, 
ctx=0x810a7e3a0) at 
/wrkdirs/usr/ports/www/nginx-devel/work/lua-nginx-module-0.10.6rc1/src/ngx_http_lua_util.c:2157
#7  0x00080263df09 in ngx_http_lua_run_thread (L=0x1378, r=0x8109bd050, 
ctx=0x810a7e3a0, nrets=0) at 
/wrkdirs/usr/ports/www/nginx-devel/work/lua-nginx-module-0.10.6rc1/src/ngx_http_lua_util.c:1056
#8  0x000802647114 in ngx_http_lua_access_by_chunk (L=0x1378, 
r=0x8109bd050) at 
/wrkdirs/usr/ports/www/nginx-devel/work/lua-nginx-module-0.10.6rc1/src/ngx_http_lua_accessby.c:332
#9  0x0008026473e1 in ngx_http_lua_access_handler_file (r=0x8109bd050) at 
/wrkdirs/usr/ports/www/nginx-devel/work/lua-nginx-module-0.10.6rc1/src/ngx_http_lua_accessby.c:232
#10 0x000802646b75 in ngx_http_lua_access_handler (r=0x8109bd050) at 
/wrkdirs/usr/ports/www/nginx-devel/work/lua-nginx-module-0.10.6rc1/src/ngx_http_lua_accessby.c:163
#11 0x00473163 in ngx_http_core_access_phase (r=0x8109bd050, 
ph=0x80325c780) at src/http/ngx_http_core_module.c:1071
#12 0x0047208f in ngx_http_core_run_phases (r=0x8109bd050) at 
src/http/ngx_http_core_module.c:845
#13 0x00472007 in ngx_http_handler (r=0x8109bd050) at 
src/http/ngx_http_core_module.c:828
#14 0x00484424 in ngx_http_process_request (r=0x8109bd050) at 
src/http/ngx_http_request.c:1914
#15 0x004865c6 in ngx_http_process_request_headers (rev=0x808811168) at 
src/http/ngx_http_request.c:1346
#16 0x00485bbd in ngx_http_process_request_line (rev=0x808811168) at 
src/http/ngx_http_request.c:1026
#17 0x004880d2 in ngx_http_keepalive_handler (rev=0x808811168) at 
src/http/ngx_http_request.c:3191
#18 0x0045f703 in ngx_kqueue_process_events (cycle=0x8022c0050, 
timer=176, flags=1) at src/event/modules/ngx_kqueue_module.c:669
#19 0x0044e02a in ngx_process_events_and_timers (cycle=0x8022c0050) at 
src/event/ngx_event.c:242
#20 0x0045c54a in ngx_worker_process_cycle (cycle=0x8022c0050, 
data=0x2) at src/os/unix/ngx_process_cycle.c:753
#21 0x0045965d in ngx_spawn_process (cycle=0x8022c0050, proc=0x45c440 
, data=0x2, name=0x51feba "worker process", 
respawn=-3) at src/os/unix/ngx_process.c:198
#22 0x0045b240 in ngx_start_worker_processes (cycle=0x8022c0050, n=6, 
type=-3) at src/os/unix/ngx_process_cycle.c:358
#23 0x0045ab08 in ngx_master_process_cycle (cycle=0x8022c0050) at 
src/os/unix/ngx_process_cycle.c:130
#24 0x004129f2 in main (argc=1, argv=0x7fffeb20) at 
src/core/nginx.c:367
Current language:  auto; currently minimal

(gdb) p *cache
$1 = {rbtree = {root = 0x83206f360, sentinel = 0x80321ae20, insert = 0x447ce0 
}, sentinel = {key = 0, left = 0x0, 
right = 0x0, parent = 0x816dd4740, color = 0 '\0', data = 0 '\0'}, expire_queue 
= {prev = 0x8315a9208, next = 0x83ea77a48}, current = 3747, max = 8, 
inactive = 20}
(gdb) p *cache->rbtree->root
$2 = {key = 2738298451, left = 0x841e8a340, right = 0x816dd12c0, parent = 0x0, 
color = 0 '\0', data = 188 'ΒΌ'}
(gdb) p *cache->rbtree->root->right
$3 = {key = 3543292973, left = 0x80c3f7200, righ

Re: nginx and FreeBSD11

2016-09-07 Thread Matt Smith

On Sep 07 22:13, Slawa Olhovchenkov wrote:

I am have strange issuse with nginx on FreeBSD11.
I am have FreeBSD11 instaled over STABLE-10.
nginx build for FreeBSD10 and run w/o recompile work fine.
nginx build for FreeBSD11 crushed inside rbtree lookups: next node
totaly craped.

I am see next potential cause:

1) clang 3.8 code generation issuse
2) system library issuse

may be i am miss something?

How to find real cause?


I am running stable/11 on amd64 and using the nginx-devel port with 
libressl-devel and I have no issues at all. It works fine. Have you 
rebuilt *all* ports since updating from stable/10?


--
Matt
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"