Re: CURRENT slow and shaky network stability

2016-04-13 Thread O. Hartmann
On Sun, 10 Apr 2016 07:16:56 -0700
Cy Schubert  wrote:

> In message <20160409105444.7020f2f1.ohart...@zedat.fu-berlin.de>, "O. 
> Hartmann"
>  writes:
> > --Sig_/SqWr.x1C_BgJVIYh7m_9T5y
> > Content-Type: text/plain; charset=US-ASCII
> > Content-Transfer-Encoding: quoted-printable
> > 
> > Am Mon, 04 Apr 2016 23:46:08 -0700
> > Cy Schubert  schrieb:
> >   
> > > In message
> > > <20160405082047.670d7...@freyja.zeit4.iv.bundesimmobilien.de>,=  
> > =20  
> > > "O. H
> > > artmann" writes:  
> > > > On Sat, 02 Apr 2016 16:14:57 -0700
> > > > Cy Schubert  wrote:
> > > >  =20  
> > > > > In message <20160402231955.41b05526.ohart...@zedat.fu-berlin.de>,
> > > > > "O.=  
> > =20  
> > > > > Hartmann"
> > > > >  writes: =20  
> > > > > > --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> > > > > > Content-Type: text/plain; charset=3DUS-ASCII
> > > > > > Content-Transfer-Encoding: quoted-printable
> > > > > >=20
> > > > > > Am Sat, 2 Apr 2016 11:39:10 +0200
> > > > > > "O. Hartmann"  schrieb:
> > > > > >=20  
> > > > > > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > > > > > "O. Hartmann"  schrieb:
> > > > > > >=3D20   =20  
> > > > > > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > > > > > Cy Schubert  schrieb:
> > > > > > > >  =3D20   =20  
> > > > > > > > > In message <56f6c6b0.6010...@protected-networks.net>,
> > > > > > > > > Michael=  
> >  Butle =20  
> > > > r =20  
> > > > > > > > > =3D   =20  
> > > > > > writes:   =3D20   =20  
> > > > > > > > > > -current is not great for interactive use at all. The
> > > > > > > > > > strat=  
> > egy of  
> > > > > > > > > > pre-emptively dropping idle processes to swap is
> > > > > > > > > > hurting ..=  
> >  big  
> > > > > > > > > > tim=3D   =20  
> > > > > > e. =3D20   =20  
> > > > > > > > >=3D20
> > > > > > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out
> > > > > > > > > =  
> > to  
> > > > > > > > > disk.=3D   =20  
> > > > > >  LRU=3D20   =20  
> > > > > > > > > doesn't do this.
> > > > > > > > >=3D20   =20  
> > > > > > > > > >=3D20
> > > > > > > > > > Compare inactive memory to swap in this example ..
> > > > > > > > > >=3D20
> > > > > > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0%
> > > > > > > > > > interrupt,=  
> >  94.5%  
> > > > > > > > > > i=3D   =20  
> > > > > > dle   =20  
> > > > > > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M
> > > > > > > > > > F=  
> > ree  
> > > > > > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> > > > > > > > > > =3D=  
> > 20   =20  
> > > > > > > > >=3D20
> > > > > > > > > To analyze this you need to capture vmstat output. You'll
> > > > > > > > > see=  
> >  the  
> > > > > > > > > fre=3D   =20  
> > > > > > e pool=3D20   =20  
> > > > > > > > > dip below a threshold and pages go out to disk in response.
> > > > > > > > > I=  
> > f you  
> > > > > > > > > ha=3D   =20  
> > > > > > ve=3D20   =20  
> > > > > > > > > daemons with small working sets, pages that are not part of
> > > > > > > > > t=  
> > he  
> > > > > > > > > worki=3D   =20  
> > > > > > ng=3D20   =20  
> > > > > > > > > sets for daemons or applications will eventually be paged
> > > > > > > > > out=  
> > . This  
> > > > > > > > > i=3D   =20  
> > > > > > s not=3D20   =20  
> > > > > > > > > a bad thing. In your example above, the 281 MB of UFS
> > > > > > > > > buffers=  
> >  are  
> > > > > > > > > mor=3D   =20  
> > > > > > e=3D20   =20  
> > > > > > > > > active than the 917 MB paged out. If it's paged out and
> > > > > > > > > never=  
> >  used  
> > > > > > > > > ag=3D   =20  
> > > > > > ain,=3D20   =20  
> > > > > > > > > then it doesn't hurt. However the 281 MB of buffers saves
> > > > > > > > > you=  
> >  I/O.  
> > > > > > > > > Th=3D   =20  
> > > > > > e=3D20   =20  
> > > > > > > > > inactive pages are part of your free pool that were active
> > > > > > > > > at=  
> >  one  
> > > > > > > > > tim=3D   =20  
> > > > > > e but=3D20   =20  
> > > > > > > > > now are not. They may be reclaimed and if they are, you've
> > > > > > > > > ju=  
> > st  
> > > > > > > > > saved=3D   =20  
> > > > > >  more=3D20   =20  
> > > > > > > > > I/O.
> > > > > > > > >=3D20
> > > > > > > > > Top is a poor tool to analyze memory use. Vmstat is the
> > > > > > > > > bette=  
> > r tool  
> > > > > > > > > t=3D   =20  
> > > > > > o help=3D20   =20  
> > > > > > > > > understand memory use. Inactive memory isn't a bad thing per
> > > > > > > > > =  
> > se.  
> > > > > > > > > Moni=3D   =20  
> > > > > > tor=3D20   =20  
> > > > > > > > > page outs, scan rate and page reclaims.
> > > > > > > > >=3D20
> > > > > > > > >=3D20   =20  
> > > > > > > >=3D20
> > > > > > > > I give up! Tried to check via ssh/vmstat what is going on.
> > > > > > > > Last=  
> >  lines  
> > > 

Re: CURRENT slow and shaky network stability

2016-04-11 Thread Adrian Chadd
Can you try 'ifconfig wlan0 promisc' instead and see if that helps?



-a
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-04-11 Thread Poul-Henning Kamp


I have been trying to capture a packet trace for the breaking SSH
and while not a statistically rigid conclusion, it doesnt seem
to happen when I run a tcpdump on wlan0.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-04-10 Thread Cy Schubert
In message <20160409105444.7020f2f1.ohart...@zedat.fu-berlin.de>, "O. 
Hartmann"
 writes:
> --Sig_/SqWr.x1C_BgJVIYh7m_9T5y
> Content-Type: text/plain; charset=US-ASCII
> Content-Transfer-Encoding: quoted-printable
> 
> Am Mon, 04 Apr 2016 23:46:08 -0700
> Cy Schubert  schrieb:
> 
> > In message <20160405082047.670d7...@freyja.zeit4.iv.bundesimmobilien.de>,=
> =20
> > "O. H
> > artmann" writes:
> > > On Sat, 02 Apr 2016 16:14:57 -0700
> > > Cy Schubert  wrote:
> > >  =20
> > > > In message <20160402231955.41b05526.ohart...@zedat.fu-berlin.de>, "O.=
> =20
> > > > Hartmann"
> > > >  writes: =20
> > > > > --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> > > > > Content-Type: text/plain; charset=3DUS-ASCII
> > > > > Content-Transfer-Encoding: quoted-printable
> > > > >=20
> > > > > Am Sat, 2 Apr 2016 11:39:10 +0200
> > > > > "O. Hartmann"  schrieb:
> > > > >=20
> > > > > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > > > > "O. Hartmann"  schrieb:
> > > > > >=3D20   =20
> > > > > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > > > > Cy Schubert  schrieb:
> > > > > > >  =3D20   =20
> > > > > > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael=
>  Butle =20
> > > r =20
> > > > > > > > =3D   =20
> > > > > writes:   =3D20   =20
> > > > > > > > > -current is not great for interactive use at all. The strat=
> egy of
> > > > > > > > > pre-emptively dropping idle processes to swap is hurting ..=
>  big
> > > > > > > > > tim=3D   =20
> > > > > e. =3D20   =20
> > > > > > > >=3D20
> > > > > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out =
> to
> > > > > > > > disk.=3D   =20
> > > > >  LRU=3D20   =20
> > > > > > > > doesn't do this.
> > > > > > > >=3D20   =20
> > > > > > > > >=3D20
> > > > > > > > > Compare inactive memory to swap in this example ..
> > > > > > > > >=3D20
> > > > > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt,=
>  94.5%
> > > > > > > > > i=3D   =20
> > > > > dle   =20
> > > > > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M F=
> ree
> > > > > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =3D=
> 20   =20
> > > > > > > >=3D20
> > > > > > > > To analyze this you need to capture vmstat output. You'll see=
>  the
> > > > > > > > fre=3D   =20
> > > > > e pool=3D20   =20
> > > > > > > > dip below a threshold and pages go out to disk in response. I=
> f you
> > > > > > > > ha=3D   =20
> > > > > ve=3D20   =20
> > > > > > > > daemons with small working sets, pages that are not part of t=
> he
> > > > > > > > worki=3D   =20
> > > > > ng=3D20   =20
> > > > > > > > sets for daemons or applications will eventually be paged out=
> . This
> > > > > > > > i=3D   =20
> > > > > s not=3D20   =20
> > > > > > > > a bad thing. In your example above, the 281 MB of UFS buffers=
>  are
> > > > > > > > mor=3D   =20
> > > > > e=3D20   =20
> > > > > > > > active than the 917 MB paged out. If it's paged out and never=
>  used
> > > > > > > > ag=3D   =20
> > > > > ain,=3D20   =20
> > > > > > > > then it doesn't hurt. However the 281 MB of buffers saves you=
>  I/O.
> > > > > > > > Th=3D   =20
> > > > > e=3D20   =20
> > > > > > > > inactive pages are part of your free pool that were active at=
>  one
> > > > > > > > tim=3D   =20
> > > > > e but=3D20   =20
> > > > > > > > now are not. They may be reclaimed and if they are, you've ju=
> st
> > > > > > > > saved=3D   =20
> > > > >  more=3D20   =20
> > > > > > > > I/O.
> > > > > > > >=3D20
> > > > > > > > Top is a poor tool to analyze memory use. Vmstat is the bette=
> r tool
> > > > > > > > t=3D   =20
> > > > > o help=3D20   =20
> > > > > > > > understand memory use. Inactive memory isn't a bad thing per =
> se.
> > > > > > > > Moni=3D   =20
> > > > > tor=3D20   =20
> > > > > > > > page outs, scan rate and page reclaims.
> > > > > > > >=3D20
> > > > > > > >=3D20   =20
> > > > > > >=3D20
> > > > > > > I give up! Tried to check via ssh/vmstat what is going on. Last=
>  lines
> > > > > > > b=3D   =20
> > > > > efore broken   =20
> > > > > > > pipe:
> > > > > > >=3D20
> > > > > > > [...]
> > > > > > > procs  memory   pagedisks faults   =
>=20
> > > cpu =20
> > > > > > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   ins=
> yc =20
> > > s =20
> > > > > > > =3D   =20
> > > > > us sy id   =20
> > > > > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23=
> 907
> > > > > > > 540=3D   =20
> > > > > 0 95  5  0   =20
> > > > > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40=
> 869
> > > > > > > 345=3D   =20
> > > > > 9 93  7  0   =20
> > > > > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52=
> 192
> > > > > > > 436=3D   =20
> > > > > 6 91  9  0   =20
> > > > > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 

Re: CURRENT slow and shaky network stability

2016-04-09 Thread O. Hartmann
Am Mon, 04 Apr 2016 23:46:08 -0700
Cy Schubert  schrieb:

> In message <20160405082047.670d7...@freyja.zeit4.iv.bundesimmobilien.de>, 
> "O. H
> artmann" writes:
> > On Sat, 02 Apr 2016 16:14:57 -0700
> > Cy Schubert  wrote:
> >   
> > > In message <20160402231955.41b05526.ohart...@zedat.fu-berlin.de>, "O. 
> > > Hartmann"
> > >  writes:  
> > > > --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> > > > Content-Type: text/plain; charset=US-ASCII
> > > > Content-Transfer-Encoding: quoted-printable
> > > > 
> > > > Am Sat, 2 Apr 2016 11:39:10 +0200
> > > > "O. Hartmann"  schrieb:
> > > > 
> > > > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > > > "O. Hartmann"  schrieb:
> > > > >=20
> > > > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > > > Cy Schubert  schrieb:
> > > > > >  =20
> > > > > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael 
> > > > > > > Butle  
> > r  
> > > > > > > =
> > > > writes:   =20
> > > > > > > > -current is not great for interactive use at all. The strategy 
> > > > > > > > of
> > > > > > > > pre-emptively dropping idle processes to swap is hurting .. big
> > > > > > > > tim=
> > > > e. =20
> > > > > > >=20
> > > > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> > > > > > > disk.=
> > > >  LRU=20
> > > > > > > doesn't do this.
> > > > > > >=20
> > > > > > > >=20
> > > > > > > > Compare inactive memory to swap in this example ..
> > > > > > > >=20
> > > > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 
> > > > > > > > 94.5%
> > > > > > > > i=
> > > > dle
> > > > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =20
> > > > > > >=20
> > > > > > > To analyze this you need to capture vmstat output. You'll see the
> > > > > > > fre=
> > > > e pool=20
> > > > > > > dip below a threshold and pages go out to disk in response. If you
> > > > > > > ha=
> > > > ve=20
> > > > > > > daemons with small working sets, pages that are not part of the
> > > > > > > worki=
> > > > ng=20
> > > > > > > sets for daemons or applications will eventually be paged out. 
> > > > > > > This
> > > > > > > i=
> > > > s not=20
> > > > > > > a bad thing. In your example above, the 281 MB of UFS buffers are
> > > > > > > mor=
> > > > e=20
> > > > > > > active than the 917 MB paged out. If it's paged out and never used
> > > > > > > ag=
> > > > ain,=20
> > > > > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O.
> > > > > > > Th=
> > > > e=20
> > > > > > > inactive pages are part of your free pool that were active at one
> > > > > > > tim=
> > > > e but=20
> > > > > > > now are not. They may be reclaimed and if they are, you've just
> > > > > > > saved=
> > > >  more=20
> > > > > > > I/O.
> > > > > > >=20
> > > > > > > Top is a poor tool to analyze memory use. Vmstat is the better 
> > > > > > > tool
> > > > > > > t=
> > > > o help=20
> > > > > > > understand memory use. Inactive memory isn't a bad thing per se.
> > > > > > > Moni=
> > > > tor=20
> > > > > > > page outs, scan rate and page reclaims.
> > > > > > >=20
> > > > > > >=20
> > > > > >=20
> > > > > > I give up! Tried to check via ssh/vmstat what is going on. Last 
> > > > > > lines
> > > > > > b=
> > > > efore broken
> > > > > > pipe:
> > > > > >=20
> > > > > > [...]
> > > > > > procs  memory   pagedisks faults
> > > > > >
> > cpu  
> > > > > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insy
> > > > > > c  
> > s  
> > > > > > =
> > > > us sy id
> > > > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907
> > > > > > 540=
> > > > 0 95  5  0
> > > > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869
> > > > > > 345=
> > > > 9 93  7  0
> > > > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192
> > > > > > 436=
> > > > 6 91  9  0
> > > > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209
> > > > > > 436=
> > > > 8 88 12  0
> > > > > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569
> > > > > > 704=
> > > > 359 87 13  0
> > > > > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337
> > > > > > 484=
> > > > 861 93  7  0
> > > > > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131
> > > > > > 4440=
> > > > 7 95  5  0
> > > > > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366
> > > > > > 3806=
> > > > 0 89 11  0
> > > > > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   

Re: CURRENT slow and shaky network stability

2016-04-05 Thread Cy Schubert
In message <20160405092712.131ee...@freyja.zeit4.iv.bundesimmobilien.de>, 
"O. H
artmann" writes:
> On Mon, 04 Apr 2016 23:46:08 -0700
> Cy Schubert  wrote:
> 
> > In message <20160405082047.670d7...@freyja.zeit4.iv.bundesimmobilien.de>, 
> > "O. H
> > artmann" writes:
> > > On Sat, 02 Apr 2016 16:14:57 -0700
> > > Cy Schubert  wrote:
> > >   
> > > > In message <20160402231955.41b05526.ohart...@zedat.fu-berlin.de>, "O. 
> > > > Hartmann"
> > > >  writes:  
> > > > > --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> > > > > Content-Type: text/plain; charset=US-ASCII
> > > > > Content-Transfer-Encoding: quoted-printable
> > > > > 
> > > > > Am Sat, 2 Apr 2016 11:39:10 +0200
> > > > > "O. Hartmann"  schrieb:
> > > > > 
> > > > > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > > > > "O. Hartmann"  schrieb:
> > > > > >=20
> > > > > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > > > > Cy Schubert  schrieb:
> > > > > > >  =20
> > > > > > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael
> > > > > > > > Butle  
> > > r  
> > > > > > > > =
> > > > > writes:   =20
> > > > > > > > > -current is not great for interactive use at all. The strateg
> y
> > > > > > > > > of pre-emptively dropping idle processes to swap is hurting .
> .
> > > > > > > > > big tim=
> > > > > e. =20
> > > > > > > >=20
> > > > > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> > > > > > > > disk.=
> > > > >  LRU=20
> > > > > > > > doesn't do this.
> > > > > > > >=20
> > > > > > > > >=20
> > > > > > > > > Compare inactive memory to swap in this example ..
> > > > > > > > >=20
> > > > > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt,
> > > > > > > > > 94.5% i=
> > > > > dle
> > > > > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Fre
> e
> > > > > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =20  
>   
> > > > > > > >=20
> > > > > > > > To analyze this you need to capture vmstat output. You'll see t
> he
> > > > > > > > fre=
> > > > > e pool=20
> > > > > > > > dip below a threshold and pages go out to disk in response. If 
> you
> > > > > > > > ha=
> > > > > ve=20
> > > > > > > > daemons with small working sets, pages that are not part of the
> > > > > > > > worki=
> > > > > ng=20
> > > > > > > > sets for daemons or applications will eventually be paged out.
> > > > > > > > This i=
> > > > > s not=20
> > > > > > > > a bad thing. In your example above, the 281 MB of UFS buffers a
> re
> > > > > > > > mor=
> > > > > e=20
> > > > > > > > active than the 917 MB paged out. If it's paged out and never u
> sed
> > > > > > > > ag=
> > > > > ain,=20
> > > > > > > > then it doesn't hurt. However the 281 MB of buffers saves you I
> /O.
> > > > > > > > Th=
> > > > > e=20
> > > > > > > > inactive pages are part of your free pool that were active at o
> ne
> > > > > > > > tim=
> > > > > e but=20
> > > > > > > > now are not. They may be reclaimed and if they are, you've just
> > > > > > > > saved=
> > > > >  more=20
> > > > > > > > I/O.
> > > > > > > >=20
> > > > > > > > Top is a poor tool to analyze memory use. Vmstat is the better
> > > > > > > > tool t=
> > > > > o help=20
> > > > > > > > understand memory use. Inactive memory isn't a bad thing per se
> .
> > > > > > > > Moni=
> > > > > tor=20
> > > > > > > > page outs, scan rate and page reclaims.
> > > > > > > >=20
> > > > > > > >=20
> > > > > > >=20
> > > > > > > I give up! Tried to check via ssh/vmstat what is going on. Last
> > > > > > > lines b=
> > > > > efore broken
> > > > > > > pipe:
> > > > > > >=20
> > > > > > > [...]
> > > > > > > procs  memory   pagedisks
> > > > > > > faults   
> > > cpu  
> > > > > > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insy
> > > > > > > c  
> > > s  
> > > > > > > =
> > > > > us sy id
> > > > > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 2390
> 7
> > > > > > > 540=
> > > > > 0 95  5  0
> > > > > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 4086
> 9
> > > > > > > 345=
> > > > > 9 93  7  0
> > > > > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 5219
> 2
> > > > > > > 436=
> > > > > 6 91  9  0
> > > > > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 8620
> 9
> > > > > > > 436=
> > > > > 8 88 12  0
> > > > > > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 7795
> 69
> > > > > > > 704=
> > > > > 359 87 13  0
> > > > > > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 5073
> 37
> > > > > > > 484=
> > > > > 861 93  7  0
> > > > > > 

Re: CURRENT slow and shaky network stability

2016-04-05 Thread O. Hartmann
On Mon, 04 Apr 2016 23:46:08 -0700
Cy Schubert  wrote:

> In message <20160405082047.670d7...@freyja.zeit4.iv.bundesimmobilien.de>, 
> "O. H
> artmann" writes:
> > On Sat, 02 Apr 2016 16:14:57 -0700
> > Cy Schubert  wrote:
> >   
> > > In message <20160402231955.41b05526.ohart...@zedat.fu-berlin.de>, "O. 
> > > Hartmann"
> > >  writes:  
> > > > --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> > > > Content-Type: text/plain; charset=US-ASCII
> > > > Content-Transfer-Encoding: quoted-printable
> > > > 
> > > > Am Sat, 2 Apr 2016 11:39:10 +0200
> > > > "O. Hartmann"  schrieb:
> > > > 
> > > > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > > > "O. Hartmann"  schrieb:
> > > > >=20
> > > > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > > > Cy Schubert  schrieb:
> > > > > >  =20
> > > > > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael
> > > > > > > Butle  
> > r  
> > > > > > > =
> > > > writes:   =20
> > > > > > > > -current is not great for interactive use at all. The strategy
> > > > > > > > of pre-emptively dropping idle processes to swap is hurting ..
> > > > > > > > big tim=
> > > > e. =20
> > > > > > >=20
> > > > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> > > > > > > disk.=
> > > >  LRU=20
> > > > > > > doesn't do this.
> > > > > > >=20
> > > > > > > >=20
> > > > > > > > Compare inactive memory to swap in this example ..
> > > > > > > >=20
> > > > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt,
> > > > > > > > 94.5% i=
> > > > dle
> > > > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =20
> > > > > > >=20
> > > > > > > To analyze this you need to capture vmstat output. You'll see the
> > > > > > > fre=
> > > > e pool=20
> > > > > > > dip below a threshold and pages go out to disk in response. If you
> > > > > > > ha=
> > > > ve=20
> > > > > > > daemons with small working sets, pages that are not part of the
> > > > > > > worki=
> > > > ng=20
> > > > > > > sets for daemons or applications will eventually be paged out.
> > > > > > > This i=
> > > > s not=20
> > > > > > > a bad thing. In your example above, the 281 MB of UFS buffers are
> > > > > > > mor=
> > > > e=20
> > > > > > > active than the 917 MB paged out. If it's paged out and never used
> > > > > > > ag=
> > > > ain,=20
> > > > > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O.
> > > > > > > Th=
> > > > e=20
> > > > > > > inactive pages are part of your free pool that were active at one
> > > > > > > tim=
> > > > e but=20
> > > > > > > now are not. They may be reclaimed and if they are, you've just
> > > > > > > saved=
> > > >  more=20
> > > > > > > I/O.
> > > > > > >=20
> > > > > > > Top is a poor tool to analyze memory use. Vmstat is the better
> > > > > > > tool t=
> > > > o help=20
> > > > > > > understand memory use. Inactive memory isn't a bad thing per se.
> > > > > > > Moni=
> > > > tor=20
> > > > > > > page outs, scan rate and page reclaims.
> > > > > > >=20
> > > > > > >=20
> > > > > >=20
> > > > > > I give up! Tried to check via ssh/vmstat what is going on. Last
> > > > > > lines b=
> > > > efore broken
> > > > > > pipe:
> > > > > >=20
> > > > > > [...]
> > > > > > procs  memory   pagedisks
> > > > > > faults   
> > cpu  
> > > > > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insy
> > > > > > c  
> > s  
> > > > > > =
> > > > us sy id
> > > > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907
> > > > > > 540=
> > > > 0 95  5  0
> > > > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869
> > > > > > 345=
> > > > 9 93  7  0
> > > > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192
> > > > > > 436=
> > > > 6 91  9  0
> > > > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209
> > > > > > 436=
> > > > 8 88 12  0
> > > > > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569
> > > > > > 704=
> > > > 359 87 13  0
> > > > > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337
> > > > > > 484=
> > > > 861 93  7  0
> > > > > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131
> > > > > > 4440=
> > > > 7 95  5  0
> > > > > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366
> > > > > > 3806=
> > > > 0 89 11  0
> > > > > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371
> > > > > > 49=
> > > > 99 85 15  0
> > > > > > 34 0 22  13G  723M 

Re: CURRENT slow and shaky network stability

2016-04-05 Thread Cy Schubert
In message <20160405082047.670d7...@freyja.zeit4.iv.bundesimmobilien.de>, 
"O. H
artmann" writes:
> On Sat, 02 Apr 2016 16:14:57 -0700
> Cy Schubert  wrote:
> 
> > In message <20160402231955.41b05526.ohart...@zedat.fu-berlin.de>, "O. 
> > Hartmann"
> >  writes:
> > > --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> > > Content-Type: text/plain; charset=US-ASCII
> > > Content-Transfer-Encoding: quoted-printable
> > > 
> > > Am Sat, 2 Apr 2016 11:39:10 +0200
> > > "O. Hartmann"  schrieb:
> > >   
> > > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > > "O. Hartmann"  schrieb:
> > > >=20  
> > > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > > Cy Schubert  schrieb:
> > > > >  =20  
> > > > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael Butle
> r
> > > > > > =  
> > > writes:   =20  
> > > > > > > -current is not great for interactive use at all. The strategy of
> > > > > > > pre-emptively dropping idle processes to swap is hurting .. big
> > > > > > > tim=  
> > > e. =20  
> > > > > >=20
> > > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> > > > > > disk.=  
> > >  LRU=20  
> > > > > > doesn't do this.
> > > > > >=20  
> > > > > > >=20
> > > > > > > Compare inactive memory to swap in this example ..
> > > > > > >=20
> > > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5%
> > > > > > > i=  
> > > dle  
> > > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =20  
> > > > > >=20
> > > > > > To analyze this you need to capture vmstat output. You'll see the
> > > > > > fre=  
> > > e pool=20  
> > > > > > dip below a threshold and pages go out to disk in response. If you
> > > > > > ha=  
> > > ve=20  
> > > > > > daemons with small working sets, pages that are not part of the
> > > > > > worki=  
> > > ng=20  
> > > > > > sets for daemons or applications will eventually be paged out. This
> > > > > > i=  
> > > s not=20  
> > > > > > a bad thing. In your example above, the 281 MB of UFS buffers are
> > > > > > mor=  
> > > e=20  
> > > > > > active than the 917 MB paged out. If it's paged out and never used
> > > > > > ag=  
> > > ain,=20  
> > > > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O.
> > > > > > Th=  
> > > e=20  
> > > > > > inactive pages are part of your free pool that were active at one
> > > > > > tim=  
> > > e but=20  
> > > > > > now are not. They may be reclaimed and if they are, you've just
> > > > > > saved=  
> > >  more=20  
> > > > > > I/O.
> > > > > >=20
> > > > > > Top is a poor tool to analyze memory use. Vmstat is the better tool
> > > > > > t=  
> > > o help=20  
> > > > > > understand memory use. Inactive memory isn't a bad thing per se.
> > > > > > Moni=  
> > > tor=20  
> > > > > > page outs, scan rate and page reclaims.
> > > > > >=20
> > > > > >=20  
> > > > >=20
> > > > > I give up! Tried to check via ssh/vmstat what is going on. Last lines
> > > > > b=  
> > > efore broken  
> > > > > pipe:
> > > > >=20
> > > > > [...]
> > > > > procs  memory   pagedisks faults 
> cpu
> > > > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insyc
> s
> > > > > =  
> > > us sy id  
> > > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907
> > > > > 540=  
> > > 0 95  5  0  
> > > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869
> > > > > 345=  
> > > 9 93  7  0  
> > > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192
> > > > > 436=  
> > > 6 91  9  0  
> > > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209
> > > > > 436=  
> > > 8 88 12  0  
> > > > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569
> > > > > 704=  
> > > 359 87 13  0  
> > > > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337
> > > > > 484=  
> > > 861 93  7  0  
> > > > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131
> > > > > 4440=  
> > > 7 95  5  0  
> > > > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366
> > > > > 3806=  
> > > 0 89 11  0  
> > > > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371
> > > > > 49=  
> > > 99 85 15  0  
> > > > > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142
> > > > > 443=  
> > > 1 95  5  0  
> > > > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pip
> e
> > > > >=20
> > > > >=20
> > > > > This makes this crap system completely unusable. The server (FreeBSD
> > > > > 11=  
> > > .0-CURRENT #20  
> > > > > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did
> > > > > poudriere=  
> > >  bulk job. I  
> > > > > can not even determine what terminal goes down first - 

Re: CURRENT slow and shaky network stability

2016-04-05 Thread O. Hartmann
On Sat, 02 Apr 2016 16:14:57 -0700
Cy Schubert  wrote:

> In message <20160402231955.41b05526.ohart...@zedat.fu-berlin.de>, "O. 
> Hartmann"
>  writes:
> > --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> > Content-Type: text/plain; charset=US-ASCII
> > Content-Transfer-Encoding: quoted-printable
> > 
> > Am Sat, 2 Apr 2016 11:39:10 +0200
> > "O. Hartmann"  schrieb:
> >   
> > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > "O. Hartmann"  schrieb:
> > >=20  
> > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > Cy Schubert  schrieb:
> > > >  =20  
> > > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler
> > > > > =  
> > writes:   =20  
> > > > > > -current is not great for interactive use at all. The strategy of
> > > > > > pre-emptively dropping idle processes to swap is hurting .. big
> > > > > > tim=  
> > e. =20  
> > > > >=20
> > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> > > > > disk.=  
> >  LRU=20  
> > > > > doesn't do this.
> > > > >=20  
> > > > > >=20
> > > > > > Compare inactive memory to swap in this example ..
> > > > > >=20
> > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5%
> > > > > > i=  
> > dle  
> > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =20  
> > > > >=20
> > > > > To analyze this you need to capture vmstat output. You'll see the
> > > > > fre=  
> > e pool=20  
> > > > > dip below a threshold and pages go out to disk in response. If you
> > > > > ha=  
> > ve=20  
> > > > > daemons with small working sets, pages that are not part of the
> > > > > worki=  
> > ng=20  
> > > > > sets for daemons or applications will eventually be paged out. This
> > > > > i=  
> > s not=20  
> > > > > a bad thing. In your example above, the 281 MB of UFS buffers are
> > > > > mor=  
> > e=20  
> > > > > active than the 917 MB paged out. If it's paged out and never used
> > > > > ag=  
> > ain,=20  
> > > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O.
> > > > > Th=  
> > e=20  
> > > > > inactive pages are part of your free pool that were active at one
> > > > > tim=  
> > e but=20  
> > > > > now are not. They may be reclaimed and if they are, you've just
> > > > > saved=  
> >  more=20  
> > > > > I/O.
> > > > >=20
> > > > > Top is a poor tool to analyze memory use. Vmstat is the better tool
> > > > > t=  
> > o help=20  
> > > > > understand memory use. Inactive memory isn't a bad thing per se.
> > > > > Moni=  
> > tor=20  
> > > > > page outs, scan rate and page reclaims.
> > > > >=20
> > > > >=20  
> > > >=20
> > > > I give up! Tried to check via ssh/vmstat what is going on. Last lines
> > > > b=  
> > efore broken  
> > > > pipe:
> > > >=20
> > > > [...]
> > > > procs  memory   pagedisks faults cpu
> > > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs
> > > > =  
> > us sy id  
> > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907
> > > > 540=  
> > 0 95  5  0  
> > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869
> > > > 345=  
> > 9 93  7  0  
> > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192
> > > > 436=  
> > 6 91  9  0  
> > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209
> > > > 436=  
> > 8 88 12  0  
> > > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569
> > > > 704=  
> > 359 87 13  0  
> > > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337
> > > > 484=  
> > 861 93  7  0  
> > > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131
> > > > 4440=  
> > 7 95  5  0  
> > > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366
> > > > 3806=  
> > 0 89 11  0  
> > > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371
> > > > 49=  
> > 99 85 15  0  
> > > > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142
> > > > 443=  
> > 1 95  5  0  
> > > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> > > >=20
> > > >=20
> > > > This makes this crap system completely unusable. The server (FreeBSD
> > > > 11=  
> > .0-CURRENT #20  
> > > > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did
> > > > poudriere=  
> >  bulk job. I  
> > > > can not even determine what terminal goes down first - another one,
> > > > muc=  
> > h more time  
> > > > idle than the one shwoing the "vmstat 5" output, is still alive!=20
> > > >=20
> > > > i consider this a serious bug and it is no benefit what happened since
> > > > =  
> > this "fancy"  
> > > > update. :-( =20  
> > >=20
> > > By the way - it might be of interest and some hint.
> > >=20
> > > One of my boxes is 

Re: CURRENT slow and shaky network stability

2016-04-02 Thread Cy Schubert
In message <20160402113910.14de7eaf.ohart...@zedat.fu-berlin.de>, "O. 
Hartmann"
 writes:
> --Sig_/cnPyYwlIcD24/.m6dd2EX7j
> Content-Type: text/plain; charset=US-ASCII
> Content-Transfer-Encoding: quoted-printable
> 
> Am Sat, 2 Apr 2016 10:55:03 +0200
> "O. Hartmann"  schrieb:
> 
> > Am Sat, 02 Apr 2016 01:07:55 -0700
> > Cy Schubert  schrieb:
> >=20
> > > In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler wr=
> ites: =20
> > > > -current is not great for interactive use at all. The strategy of
> > > > pre-emptively dropping idle processes to swap is hurting .. big time.=
>=20
> > >=20
> > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. L=
> RU=20
> > > doesn't do this.
> > >  =20
> > > >=20
> > > > Compare inactive memory to swap in this example ..
> > > >=20
> > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse   =20
> > >=20
> > > To analyze this you need to capture vmstat output. You'll see the free =
> pool=20
> > > dip below a threshold and pages go out to disk in response. If you have=
> =20
> > > daemons with small working sets, pages that are not part of the working=
> =20
> > > sets for daemons or applications will eventually be paged out. This is =
> not=20
> > > a bad thing. In your example above, the 281 MB of UFS buffers are more=
> =20
> > > active than the 917 MB paged out. If it's paged out and never used agai=
> n,=20
> > > then it doesn't hurt. However the 281 MB of buffers saves you I/O. The=
> =20
> > > inactive pages are part of your free pool that were active at one time =
> but=20
> > > now are not. They may be reclaimed and if they are, you've just saved m=
> ore=20
> > > I/O.
> > >=20
> > > Top is a poor tool to analyze memory use. Vmstat is the better tool to =
> help=20
> > > understand memory use. Inactive memory isn't a bad thing per se. Monito=
> r=20
> > > page outs, scan rate and page reclaims.
> > >=20
> > >  =20
> >=20
> > I give up! Tried to check via ssh/vmstat what is going on. Last lines bef=
> ore broken
> > pipe:
> >=20
> > [...]
> > procs  memory   pagedisks faults cpu
> > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs us=
>  sy id
> > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907  5400 =
> 95  5  0
> > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869  3459 =
> 93  7  0
> > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192  4366 =
> 91  9  0
> > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209  4368 =
> 88 12  0
> > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569 70435=
> 9 87 13  0
> > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337 48486=
> 1 93  7  0
> > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131 44407 =
> 95  5  0
> > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366 38060 =
> 89 11  0
> > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371  4999=
>  85 15  0
> > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142  4431 =
> 95  5  0
> > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> >=20
> >=20
> > This makes this crap system completely unusable. The server (FreeBSD 11.0=
> -CURRENT #20
> > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did poudriere b=
> ulk job. I can
> > not even determine what terminal goes down first - another one, much more=
>  time idle than
> > the one shwoing the "vmstat 5" output, is still alive!=20
> >=20
> > i consider this a serious bug and it is no benefit what happened since th=
> is "fancy"
> > update. :-(
> 
> By the way - it might be of interest and some hint.
> 
> One of my boxes is acting as server and gateway. It utilises NAT, IPFW, whe=
> n it is under
> high load, as it was today, sometimes passing the network flow from ISP int=
> o the network
> for clients is extremely slow. I do not consider this the reason for collap=
> sing ssh
> sessions, since this incident happens also under no-load, but in the overal=
> l-view onto
> the problem, this could be a hint - I hope.=20

Natd is a critical part of your network infrastructure. rtprio 1 natd or 
rtprio 1 it after the fact. It won't hurt and it'll take this variable out 
of consideration, as much as we can.


-- 
Cheers,
Cy Schubert  or 
FreeBSD UNIX:     Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send 

Re: CURRENT slow and shaky network stability

2016-04-02 Thread Cy Schubert
In message <20160402231955.41b05526.ohart...@zedat.fu-berlin.de>, "O. 
Hartmann"
 writes:
> --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> Content-Type: text/plain; charset=US-ASCII
> Content-Transfer-Encoding: quoted-printable
> 
> Am Sat, 2 Apr 2016 11:39:10 +0200
> "O. Hartmann"  schrieb:
> 
> > Am Sat, 2 Apr 2016 10:55:03 +0200
> > "O. Hartmann"  schrieb:
> >=20
> > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > Cy Schubert  schrieb:
> > >  =20
> > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler =
> writes:   =20
> > > > > -current is not great for interactive use at all. The strategy of
> > > > > pre-emptively dropping idle processes to swap is hurting .. big tim=
> e. =20
> > > >=20
> > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk.=
>  LRU=20
> > > > doesn't do this.
> > > >=20
> > > > >=20
> > > > > Compare inactive memory to swap in this example ..
> > > > >=20
> > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% i=
> dle
> > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =20
> > > >=20
> > > > To analyze this you need to capture vmstat output. You'll see the fre=
> e pool=20
> > > > dip below a threshold and pages go out to disk in response. If you ha=
> ve=20
> > > > daemons with small working sets, pages that are not part of the worki=
> ng=20
> > > > sets for daemons or applications will eventually be paged out. This i=
> s not=20
> > > > a bad thing. In your example above, the 281 MB of UFS buffers are mor=
> e=20
> > > > active than the 917 MB paged out. If it's paged out and never used ag=
> ain,=20
> > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O. Th=
> e=20
> > > > inactive pages are part of your free pool that were active at one tim=
> e but=20
> > > > now are not. They may be reclaimed and if they are, you've just saved=
>  more=20
> > > > I/O.
> > > >=20
> > > > Top is a poor tool to analyze memory use. Vmstat is the better tool t=
> o help=20
> > > > understand memory use. Inactive memory isn't a bad thing per se. Moni=
> tor=20
> > > > page outs, scan rate and page reclaims.
> > > >=20
> > > >=20
> > >=20
> > > I give up! Tried to check via ssh/vmstat what is going on. Last lines b=
> efore broken
> > > pipe:
> > >=20
> > > [...]
> > > procs  memory   pagedisks faults cpu
> > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs =
> us sy id
> > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907  540=
> 0 95  5  0
> > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869  345=
> 9 93  7  0
> > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192  436=
> 6 91  9  0
> > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209  436=
> 8 88 12  0
> > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569 704=
> 359 87 13  0
> > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337 484=
> 861 93  7  0
> > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131 4440=
> 7 95  5  0
> > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366 3806=
> 0 89 11  0
> > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371  49=
> 99 85 15  0
> > > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142  443=
> 1 95  5  0
> > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> > >=20
> > >=20
> > > This makes this crap system completely unusable. The server (FreeBSD 11=
> .0-CURRENT #20
> > > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did poudriere=
>  bulk job. I
> > > can not even determine what terminal goes down first - another one, muc=
> h more time
> > > idle than the one shwoing the "vmstat 5" output, is still alive!=20
> > >=20
> > > i consider this a serious bug and it is no benefit what happened since =
> this "fancy"
> > > update. :-( =20
> >=20
> > By the way - it might be of interest and some hint.
> >=20
> > One of my boxes is acting as server and gateway. It utilises NAT, IPFW, w=
> hen it is under
> > high load, as it was today, sometimes passing the network flow from ISP i=
> nto the network
> > for clients is extremely slow. I do not consider this the reason for coll=
> apsing ssh
> > sessions, since this incident happens also under no-load, but in the over=
> all-view onto
> > the problem, this could be a hint - I hope.=20
> 
> I just checked on one box, that "broke pipe" very quickly after I started p=
> oudriere,
> while it did well a couple of hours before until the pipe broke. It seems i=
> t's load
> dependend when the ssh session gets wrecked, but more important, after the =
> long-haul
> poudriere run, I rebooted the 

Re: CURRENT slow and shaky network stability

2016-04-02 Thread Cy Schubert
In message 
, Kevin Oberman writes:
> --089e01176a5d71db0d052f8803c7
> Content-Type: text/plain; charset=UTF-8
> 
> On Sat, Apr 2, 2016 at 2:19 PM, O. Hartmann 
> wrote:
> 
> > Am Sat, 2 Apr 2016 11:39:10 +0200
> > "O. Hartmann"  schrieb:
> >
> > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > "O. Hartmann"  schrieb:
> > >
> > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > Cy Schubert  schrieb:
> > > >
> > > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael
> > Butler writes:
> > > > > > -current is not great for interactive use at all. The strategy of
> > > > > > pre-emptively dropping idle processes to swap is hurting .. big
> > time.
> > > > >
> > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> > disk. LRU
> > > > > doesn't do this.
> > > > >
> > > > > >
> > > > > > Compare inactive memory to swap in this example ..
> > > > > >
> > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5%
> > idle
> > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> > > > >
> > > > > To analyze this you need to capture vmstat output. You'll see the
> > free pool
> > > > > dip below a threshold and pages go out to disk in response. If you
> > have
> > > > > daemons with small working sets, pages that are not part of the
> > working
> > > > > sets for daemons or applications will eventually be paged out. This
> > is not
> > > > > a bad thing. In your example above, the 281 MB of UFS buffers are
> > more
> > > > > active than the 917 MB paged out. If it's paged out and never used
> > again,
> > > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O.
> > The
> > > > > inactive pages are part of your free pool that were active at one
> > time but
> > > > > now are not. They may be reclaimed and if they are, you've just
> > saved more
> > > > > I/O.
> > > > >
> > > > > Top is a poor tool to analyze memory use. Vmstat is the better tool
> > to help
> > > > > understand memory use. Inactive memory isn't a bad thing per se.
> > Monitor
> > > > > page outs, scan rate and page reclaims.
> > > > >
> > > > >
> > > >
> > > > I give up! Tried to check via ssh/vmstat what is going on. Last lines
> > before broken
> > > > pipe:
> > > >
> > > > [...]
> > > > procs  memory   pagedisks faults
> >  cpu
> > > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs
> > us sy id
> > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907
> > 5400 95  5  0
> > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869
> > 3459 93  7  0
> > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192
> > 4366 91  9  0
> > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209
> > 4368 88 12  0
> > > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569
> > 704359 87 13  0
> > > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337
> > 484861 93  7  0
> > > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131
> > 44407 95  5  0
> > > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366
> > 38060 89 11  0
> > > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371
> > 4999 85 15  0
> > > > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142
> > 4431 95  5  0
> > > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> > > >
> > > >
> > > > This makes this crap system completely unusable. The server (FreeBSD
> > 11.0-CURRENT #20
> > > > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did
> > poudriere bulk job. I
> > > > can not even determine what terminal goes down first - another one,
> > much more time
> > > > idle than the one shwoing the "vmstat 5" output, is still alive!
> > > >
> > > > i consider this a serious bug and it is no benefit what happened since
> > this "fancy"
> > > > update. :-(
> > >
> > > By the way - it might be of interest and some hint.
> > >
> > > One of my boxes is acting as server and gateway. It utilises NAT, IPFW,
> > when it is under
> > > high load, as it was today, sometimes passing the network flow from ISP
> > into the network
> > > for clients is extremely slow. I do not consider this the reason for
> > collapsing ssh
> > > sessions, since this incident happens also under no-load, but in the
> > overall-view onto
> > > the problem, this could be a hint - I hope.
> >
> > I just checked on one box, that "broke pipe" very quickly after I started
> > poudriere,
> > while it did well a couple of hours before until the pipe broke. It seems
> > it's load
> > dependend when the ssh session gets wrecked, but 

Re: CURRENT slow and shaky network stability

2016-04-02 Thread Cy Schubert
In message <20160402105503.7ede5be1.ohart...@zedat.fu-berlin.de>, "O. 
Hartmann"
 writes:
> --Sig_/VIBPN0rbNwuyJuk=dxEGA+U
> Content-Type: text/plain; charset=US-ASCII
> Content-Transfer-Encoding: quoted-printable
> 
> Am Sat, 02 Apr 2016 01:07:55 -0700
> Cy Schubert  schrieb:
> 
> > In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler writ=
> es:
> > > -current is not great for interactive use at all. The strategy of
> > > pre-emptively dropping idle processes to swap is hurting .. big time. =
> =20
> >=20
> > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. LRU=
> =20
> > doesn't do this.
> >=20
> > >=20
> > > Compare inactive memory to swap in this example ..
> > >=20
> > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =20
> >=20
> > To analyze this you need to capture vmstat output. You'll see the free po=
> ol=20
> > dip below a threshold and pages go out to disk in response. If you have=20
> > daemons with small working sets, pages that are not part of the working=20
> > sets for daemons or applications will eventually be paged out. This is no=
> t=20
> > a bad thing. In your example above, the 281 MB of UFS buffers are more=20
> > active than the 917 MB paged out. If it's paged out and never used again,=
> =20
> > then it doesn't hurt. However the 281 MB of buffers saves you I/O. The=20
> > inactive pages are part of your free pool that were active at one time bu=
> t=20
> > now are not. They may be reclaimed and if they are, you've just saved mor=
> e=20
> > I/O.
> >=20
> > Top is a poor tool to analyze memory use. Vmstat is the better tool to he=
> lp=20
> > understand memory use. Inactive memory isn't a bad thing per se. Monitor=
> =20
> > page outs, scan rate and page reclaims.
> >=20
> >=20
> 
> I give up! Tried to check via ssh/vmstat what is going on. Last lines befor=
> e broken pipe:
> 
> [...]
> procs  memory   pagedisks faults cpu
> r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs us s=
> y id
> 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907  5400 95=
>   5  0
> 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869  3459 93=
>   7  0
> 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192  4366 91=
>   9  0
> 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209  4368 88=
>  12  0
> 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569 704359 =
> 87 13  0
> 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337 484861 =
> 93  7  0
> 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131 44407 95=
>   5  0
> 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366 38060 89=
>  11  0
> 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371  4999 8=
> 5 15  0
> 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142  4431 95=
>   5  0
> Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe

How many CPUs does FreeBSD see? (CPUs being the number of cores and 
threads, i.e. my dual core intel has two threads so FreeBSD sees four CPUs.)

The load on the box shouldn't exceed more than two processes per CPU or you 
will notice performance issues. Ideally we look at load average first. If 
it's high then we check CPU%. If that looks good we look at memory and I/O. 
With the scant information at hand right now I see a possible CPU issue. 
Scan rate looks high but there's no paging so I'd consider it borderline.


-- 
Cheers,
Cy Schubert  or 
FreeBSD UNIX:     Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.




___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-04-02 Thread Kevin Oberman
On Sat, Apr 2, 2016 at 2:19 PM, O. Hartmann 
wrote:

> Am Sat, 2 Apr 2016 11:39:10 +0200
> "O. Hartmann"  schrieb:
>
> > Am Sat, 2 Apr 2016 10:55:03 +0200
> > "O. Hartmann"  schrieb:
> >
> > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > Cy Schubert  schrieb:
> > >
> > > > In message <56f6c6b0.6010...@protected-networks.net>, Michael
> Butler writes:
> > > > > -current is not great for interactive use at all. The strategy of
> > > > > pre-emptively dropping idle processes to swap is hurting .. big
> time.
> > > >
> > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> disk. LRU
> > > > doesn't do this.
> > > >
> > > > >
> > > > > Compare inactive memory to swap in this example ..
> > > > >
> > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5%
> idle
> > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> > > >
> > > > To analyze this you need to capture vmstat output. You'll see the
> free pool
> > > > dip below a threshold and pages go out to disk in response. If you
> have
> > > > daemons with small working sets, pages that are not part of the
> working
> > > > sets for daemons or applications will eventually be paged out. This
> is not
> > > > a bad thing. In your example above, the 281 MB of UFS buffers are
> more
> > > > active than the 917 MB paged out. If it's paged out and never used
> again,
> > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O.
> The
> > > > inactive pages are part of your free pool that were active at one
> time but
> > > > now are not. They may be reclaimed and if they are, you've just
> saved more
> > > > I/O.
> > > >
> > > > Top is a poor tool to analyze memory use. Vmstat is the better tool
> to help
> > > > understand memory use. Inactive memory isn't a bad thing per se.
> Monitor
> > > > page outs, scan rate and page reclaims.
> > > >
> > > >
> > >
> > > I give up! Tried to check via ssh/vmstat what is going on. Last lines
> before broken
> > > pipe:
> > >
> > > [...]
> > > procs  memory   pagedisks faults
>  cpu
> > > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs
> us sy id
> > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907
> 5400 95  5  0
> > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869
> 3459 93  7  0
> > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192
> 4366 91  9  0
> > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209
> 4368 88 12  0
> > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569
> 704359 87 13  0
> > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337
> 484861 93  7  0
> > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131
> 44407 95  5  0
> > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366
> 38060 89 11  0
> > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371
> 4999 85 15  0
> > > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142
> 4431 95  5  0
> > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> > >
> > >
> > > This makes this crap system completely unusable. The server (FreeBSD
> 11.0-CURRENT #20
> > > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did
> poudriere bulk job. I
> > > can not even determine what terminal goes down first - another one,
> much more time
> > > idle than the one shwoing the "vmstat 5" output, is still alive!
> > >
> > > i consider this a serious bug and it is no benefit what happened since
> this "fancy"
> > > update. :-(
> >
> > By the way - it might be of interest and some hint.
> >
> > One of my boxes is acting as server and gateway. It utilises NAT, IPFW,
> when it is under
> > high load, as it was today, sometimes passing the network flow from ISP
> into the network
> > for clients is extremely slow. I do not consider this the reason for
> collapsing ssh
> > sessions, since this incident happens also under no-load, but in the
> overall-view onto
> > the problem, this could be a hint - I hope.
>
> I just checked on one box, that "broke pipe" very quickly after I started
> poudriere,
> while it did well a couple of hours before until the pipe broke. It seems
> it's load
> dependend when the ssh session gets wrecked, but more important, after the
> long-haul
> poudriere run, I rebooted the box and tried again with the mentioned
> broken pipe after a
> couple of minutes after poudriere ran. Then I left the box for several
> hours and logged
> in again and checked the swap. Although there was for hours no load or
> other pressure,
> there were 31% of of swap used - still (box has 16 GB of RAM and is
> propelled by a XEON
> E3-1245 V2).
>


Re: CURRENT slow and shaky network stability

2016-04-02 Thread O. Hartmann
Am Sat, 2 Apr 2016 11:39:10 +0200
"O. Hartmann"  schrieb:

> Am Sat, 2 Apr 2016 10:55:03 +0200
> "O. Hartmann"  schrieb:
> 
> > Am Sat, 02 Apr 2016 01:07:55 -0700
> > Cy Schubert  schrieb:
> >   
> > > In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler 
> > > writes:
> > > > -current is not great for interactive use at all. The strategy of
> > > > pre-emptively dropping idle processes to swap is hurting .. big time.   
> > > >
> > > 
> > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. LRU 
> > > doesn't do this.
> > > 
> > > > 
> > > > Compare inactive memory to swap in this example ..
> > > > 
> > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse  
> > > 
> > > To analyze this you need to capture vmstat output. You'll see the free 
> > > pool 
> > > dip below a threshold and pages go out to disk in response. If you have 
> > > daemons with small working sets, pages that are not part of the working 
> > > sets for daemons or applications will eventually be paged out. This is 
> > > not 
> > > a bad thing. In your example above, the 281 MB of UFS buffers are more 
> > > active than the 917 MB paged out. If it's paged out and never used again, 
> > > then it doesn't hurt. However the 281 MB of buffers saves you I/O. The 
> > > inactive pages are part of your free pool that were active at one time 
> > > but 
> > > now are not. They may be reclaimed and if they are, you've just saved 
> > > more 
> > > I/O.
> > > 
> > > Top is a poor tool to analyze memory use. Vmstat is the better tool to 
> > > help 
> > > understand memory use. Inactive memory isn't a bad thing per se. Monitor 
> > > page outs, scan rate and page reclaims.
> > > 
> > > 
> > 
> > I give up! Tried to check via ssh/vmstat what is going on. Last lines 
> > before broken
> > pipe:
> > 
> > [...]
> > procs  memory   pagedisks faults cpu
> > r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs us 
> > sy id
> > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907  5400 95 
> >  5  0
> > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869  3459 93 
> >  7  0
> > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192  4366 91 
> >  9  0
> > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209  4368 88 
> > 12  0
> > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569 704359 
> > 87 13  0
> > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337 484861 
> > 93  7  0
> > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131 44407 95 
> >  5  0
> > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366 38060 89 
> > 11  0
> > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371  4999 
> > 85 15  0
> > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142  4431 95 
> >  5  0
> > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> > 
> > 
> > This makes this crap system completely unusable. The server (FreeBSD 
> > 11.0-CURRENT #20
> > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did poudriere 
> > bulk job. I
> > can not even determine what terminal goes down first - another one, much 
> > more time
> > idle than the one shwoing the "vmstat 5" output, is still alive! 
> > 
> > i consider this a serious bug and it is no benefit what happened since this 
> > "fancy"
> > update. :-(  
> 
> By the way - it might be of interest and some hint.
> 
> One of my boxes is acting as server and gateway. It utilises NAT, IPFW, when 
> it is under
> high load, as it was today, sometimes passing the network flow from ISP into 
> the network
> for clients is extremely slow. I do not consider this the reason for 
> collapsing ssh
> sessions, since this incident happens also under no-load, but in the 
> overall-view onto
> the problem, this could be a hint - I hope. 

I just checked on one box, that "broke pipe" very quickly after I started 
poudriere,
while it did well a couple of hours before until the pipe broke. It seems it's 
load
dependend when the ssh session gets wrecked, but more important, after the 
long-haul
poudriere run, I rebooted the box and tried again with the mentioned broken 
pipe after a
couple of minutes after poudriere ran. Then I left the box for several hours 
and logged
in again and checked the swap. Although there was for hours no load or other 
pressure,
there were 31% of of swap used - still (box has 16 GB of RAM and is propelled 
by a XEON
E3-1245 V2).


pgp47BIEtKjYN.pgp
Description: OpenPGP digital signature


Re: CURRENT slow and shaky network stability

2016-04-02 Thread O. Hartmann
Am Sat, 2 Apr 2016 10:55:03 +0200
"O. Hartmann"  schrieb:

> Am Sat, 02 Apr 2016 01:07:55 -0700
> Cy Schubert  schrieb:
> 
> > In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler 
> > writes:  
> > > -current is not great for interactive use at all. The strategy of
> > > pre-emptively dropping idle processes to swap is hurting .. big time.
> > 
> > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. LRU 
> > doesn't do this.
> >   
> > > 
> > > Compare inactive memory to swap in this example ..
> > > 
> > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> > 
> > To analyze this you need to capture vmstat output. You'll see the free pool 
> > dip below a threshold and pages go out to disk in response. If you have 
> > daemons with small working sets, pages that are not part of the working 
> > sets for daemons or applications will eventually be paged out. This is not 
> > a bad thing. In your example above, the 281 MB of UFS buffers are more 
> > active than the 917 MB paged out. If it's paged out and never used again, 
> > then it doesn't hurt. However the 281 MB of buffers saves you I/O. The 
> > inactive pages are part of your free pool that were active at one time but 
> > now are not. They may be reclaimed and if they are, you've just saved more 
> > I/O.
> > 
> > Top is a poor tool to analyze memory use. Vmstat is the better tool to help 
> > understand memory use. Inactive memory isn't a bad thing per se. Monitor 
> > page outs, scan rate and page reclaims.
> > 
> >   
> 
> I give up! Tried to check via ssh/vmstat what is going on. Last lines before 
> broken
> pipe:
> 
> [...]
> procs  memory   pagedisks faults cpu
> r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs us sy 
> id
> 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907  5400 95  
> 5  0
> 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869  3459 93  
> 7  0
> 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192  4366 91  
> 9  0
> 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209  4368 88 
> 12  0
> 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569 704359 87 
> 13  0
> 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337 484861 93 
>  7  0
> 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131 44407 95  
> 5  0
> 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366 38060 89 
> 11  0
> 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371  4999 85 
> 15  0
> 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142  4431 95  
> 5  0
> Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> 
> 
> This makes this crap system completely unusable. The server (FreeBSD 
> 11.0-CURRENT #20
> r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did poudriere bulk 
> job. I can
> not even determine what terminal goes down first - another one, much more 
> time idle than
> the one shwoing the "vmstat 5" output, is still alive! 
> 
> i consider this a serious bug and it is no benefit what happened since this 
> "fancy"
> update. :-(

By the way - it might be of interest and some hint.

One of my boxes is acting as server and gateway. It utilises NAT, IPFW, when it 
is under
high load, as it was today, sometimes passing the network flow from ISP into 
the network
for clients is extremely slow. I do not consider this the reason for collapsing 
ssh
sessions, since this incident happens also under no-load, but in the 
overall-view onto
the problem, this could be a hint - I hope. 


pgpMxSUu4ZPmO.pgp
Description: OpenPGP digital signature


Re: CURRENT slow and shaky network stability

2016-04-02 Thread O. Hartmann
Am Sat, 02 Apr 2016 01:07:55 -0700
Cy Schubert  schrieb:

> In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler writes:
> > -current is not great for interactive use at all. The strategy of
> > pre-emptively dropping idle processes to swap is hurting .. big time.  
> 
> FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. LRU 
> doesn't do this.
> 
> > 
> > Compare inactive memory to swap in this example ..
> > 
> > 110 processes: 1 running, 108 sleeping, 1 zombie
> > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse  
> 
> To analyze this you need to capture vmstat output. You'll see the free pool 
> dip below a threshold and pages go out to disk in response. If you have 
> daemons with small working sets, pages that are not part of the working 
> sets for daemons or applications will eventually be paged out. This is not 
> a bad thing. In your example above, the 281 MB of UFS buffers are more 
> active than the 917 MB paged out. If it's paged out and never used again, 
> then it doesn't hurt. However the 281 MB of buffers saves you I/O. The 
> inactive pages are part of your free pool that were active at one time but 
> now are not. They may be reclaimed and if they are, you've just saved more 
> I/O.
> 
> Top is a poor tool to analyze memory use. Vmstat is the better tool to help 
> understand memory use. Inactive memory isn't a bad thing per se. Monitor 
> page outs, scan rate and page reclaims.
> 
> 

I give up! Tried to check via ssh/vmstat what is going on. Last lines before 
broken pipe:

[...]
procs  memory   pagedisks faults cpu
r b w  avm   fre   flt  re  pi  pofr   sr ad0 ad1   insycs us sy id
22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907  5400 95  5  0
22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869  3459 93  7  0
15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192  4366 91  9  0
14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209  4368 88 12  0
26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569 704359 87 
13  0
29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337 484861 93  
7  0
27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131 44407 95  5  0
36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366 38060 89 11  0
35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371  4999 85 15 
 0
34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142  4431 95  5  0
Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe


This makes this crap system completely unusable. The server (FreeBSD 
11.0-CURRENT #20
r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did poudriere bulk 
job. I can
not even determine what terminal goes down first - another one, much more time 
idle than
the one shwoing the "vmstat 5" output, is still alive! 

i consider this a serious bug and it is no benefit what happened since this 
"fancy"
update. :-(


pgp55Iqf0zTdq.pgp
Description: OpenPGP digital signature


Re: CURRENT slow and shaky network stability

2016-04-02 Thread Cy Schubert
In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler writes:
> -current is not great for interactive use at all. The strategy of
> pre-emptively dropping idle processes to swap is hurting .. big time.

FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. LRU 
doesn't do this.

> 
> Compare inactive memory to swap in this example ..
> 
> 110 processes: 1 running, 108 sleeping, 1 zombie
> CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse

To analyze this you need to capture vmstat output. You'll see the free pool 
dip below a threshold and pages go out to disk in response. If you have 
daemons with small working sets, pages that are not part of the working 
sets for daemons or applications will eventually be paged out. This is not 
a bad thing. In your example above, the 281 MB of UFS buffers are more 
active than the 917 MB paged out. If it's paged out and never used again, 
then it doesn't hurt. However the 281 MB of buffers saves you I/O. The 
inactive pages are part of your free pool that were active at one time but 
now are not. They may be reclaimed and if they are, you've just saved more 
I/O.

Top is a poor tool to analyze memory use. Vmstat is the better tool to help 
understand memory use. Inactive memory isn't a bad thing per se. Monitor 
page outs, scan rate and page reclaims.


-- 
Cheers,
Cy Schubert  or 
FreeBSD UNIX:     Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.





___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-04-02 Thread Cy Schubert
In message <56f6c6b0.6010...@protected-networks.net>, Michael Butler writes:
> -current is not great for interactive use at all. The strategy of
> pre-emptively dropping idle processes to swap is hurting .. big time.
> 
> Compare inactive memory to swap in this example ..
> 
> 110 processes: 1 running, 108 sleeping, 1 zombie
> CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> 
>   PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU
> COMMAND
>  1819 imb  1  280   213M 11284K select  1 147:44   5.97%
> gkrellm
> 59238 imb 43  200   980M   424M select  0  10:07   1.92%
> firefox
> 
>  .. it shouldn't start randomly swapping out processes because they're
> used infrequently when there's more than enough RAM to spare ..

Inactive memory will after time be used to "top up" the free memory pool.

> 
> It also shows up when trying to reboot .. on all of my gear, 90 seconds
> of "fail-safe" time-out is no longer enough when a good proportion of
> daemons have been dropped onto swap and must be brought back in to flush
> their data segments :-(

What does vmstat 5 display? A high scan rate is indicative of memory in short 
supply.

My laptop has 6 GB RAM of which I've allocated 2.5 GB for ARC. Top shows that 
3.6 GB 
are wired (not pagable) leaving 2.4 GB available for apps. The laptop is in the 
middle of a four thread buildworld. It's using 1.8 MB swap so far. ARC is 2560 
MB. 
UFS cache is only 49 MB at the moment but will balloon to 603 MB during 
installworld. 
When it does my swap grows to 100-150 MB. (The reason is the large 2.5 GB ARC 
*and* a 
large 603 MB UFS cache.)

Notice vmstat output below. On line 8 of my vmstat output you scan rate jump to 
10K. 
Two pages per second are paged out. This is due to the free memory pool (in 
line 7) 
dropping to 49 MB, so it freed up a few pages by paging out. Notice that page 
reclaims (re) is high at times. These pages were scheduled to be paged out but 
were 
used before they were. This indicates that my laptop is is running pretty close 
to 
the line between paging a lot and not paging at all.

slippy$ vmstat 5
 procs  memory  pagedisks faults cpu
 r b w avmfre   flt  re  pi  pofr  sr ad0 da0   in   sy   cs us sy 
id
 4 0 0   3413M   208M 12039  11   9   0 12804 170   0   0  533 7865 2722 59  3 
38
 4 0 0   3260M   376M 18550   0   0   0 27436 386   9   8  576 23029 30705 94  
6  0
 4 1 0   3432M   171M 25345   0   6   0 15340 360  10  12  530 2524 1362 97  3  0
 4 0 0   3395M   208M 20904   0   0   0 22995 395  12  11  517 5427 1142 97  3  0
 4 0 0   3695M53M 20102   0   0   0 12482 473  17  10  517 1383 1244 98  2  0
 4 0 0   3404M   371M 22996  14  10   0 39691 4557  14   8  503 8540 1813 96  3 
 1
 4 1 0   3673M49M 22398 441  22   0  6778 429  10  13  543 3034 1609 97  3  0
 4 0 0   3396M   439M 19522  26   3   2 33901 10137  11  15  545 5617 1686 97  
3  0
 4 0 0   3489M   412M 26636   0   0   0 25710 393  10  12  531 5287 1450 95  3  
2
 4 0 0   3558M   337M 23364 329  13   0 20051 410  11  15  561 6052 1702 96  3  0
 4 0 0   3492M   335M 18244   0   3   0 18550 444  14   7  512 5140 2087 98  2  0
 4 0 0   3412M   404M 21765   0   0   0 25611 388   7  12  533 7873 1394 97  3  0
 5 0 0   3604M   189M 19044   0   0   0  8404 505   7  10  644 63940 90591 93  
6  1
 4 0 0   3533M   363M 13079 423  17   0 22327 464  11   8  501 7960 4194 94  3  
3
 4 0 0   3222M   616M 20822 218  17   0 34180 294  11  13  550 5602 1850 95  4  
1
 4 0 0   3307M   542M 19639  32   3   0 15940 345  13  10  516 2589 1505 96  3  
1
 4 0 0   3320M   527M 19656   0   1   0 19191 397  14   8  514 1886 1257 97  3  0
 4 0 0   3295M   605M 21676 910  35   0 25978 356  14  12  533 3039 1490 95  4  0

Page outs is the first place to look. If no page outs, page reclaims will tell 
you 
your system may be borderline. A high scan rate says that your working set size 
is 
large enough to put some  pressure on VM. Ideally in this case I should add 
memory 
but since I'm running this close to the line (I chose my ARC maximum well) I'll 
just 
save my money instead. Also, FreeBSD is doing exactly what it should in this 
scenario.

Top is a good tool but it doesn't tell the whole picture. Run vmstat to get a 
better 
picture of your memory use.

Following this thread throughout the day (on my cellphone), I'm not convinced 
this is 
a FreeBSD O/S problem. Check your apps. What are their working set sizes. Do 
your 
apps have a small or large locality of reference? Remember, O/S tuning is a 
matter of 
robbing Peter to pay Paul. Reducing the resources used by applications will pay 
back 
bigger dividends.

Hope this helps.


-- 
Cheers,
Cy Schubert  or 
FreeBSD UNIX:     Web:  

Re: CURRENT slow and shaky network stability

2016-04-02 Thread Cy Schubert
In message <201603300728.u2u7sdwc092...@gw.catspoiler.org>, Don Lewis 
writes:
> On 29 Mar, To: ohart...@zedat.fu-berlin.de wrote:
> > On 28 Mar, Don Lewis wrote:
> >> On 28 Mar, O. Hartmann wrote:
> >  
> >> If I get a chance, I try booting my FreeBSD 11 machine with less RAM to
> >> see if that is a trigger.
> > 
> > I just tried cranking hw.physmen down to 8 GB on 11.0-CURRENT r297204,
> > GENERIC kernel.  /boot/loader.conf contains:
> >   geom_mirror_load="YES"
> >   kern.geom.label.disk_ident.enable="0"
> >   kern.geom.label.gptid.enable="0"
> >   zfs_load="YES"
> >   vboxdrv_load="YES"
> >   hw.physmem="8G"
> > 
> > /etc/sysctl.conf contains:
> >   kern.ipc.shm_allow_removed=1
> > 
> > No /etc/src.conf and nothing of that should matter in /etc/make.conf.
> > 
> > 
> > This is what I see after running
> > poudriere ports -p whatever -u
> > 
> > last pid:  2102;  load averages:  0.24,  0.52,  0.36up 0+00:06:54  14:1
> 3:51
> > 52 processes:  1 running, 51 sleeping
> > CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > Mem: 95M Active, 20M Inact, 1145M Wired, 39K Buf, 5580M Free
> > ARC: 595M Total, 256M MFU, 248M MRU, 16K Anon, 14M Header, 78M Other
> > Swap: 40G Total, 40G Free
> > 
> > No swap used, inactive memory low, no interactivity problems.  Next I'll
> > try r297267, which is what I believe you are running.  I scanned the
> > commit logs between r297204 and r297267 and didn't see anything terribly
> > suspicious looking.
> 
> No problems here with r297267 either.  I did a bunch of small poudriere
> runs since the system was first booted.  Usable RAM is still dialed back
> to 8 GB.  A bit of swap is in use, mostly because nginx, which has been
> unused since the system was booted, got swapped out.  Inactive memory is
> low now that poudriere is done.
> 
> last pid: 75471;  load averages:  0.21,  0.15,  0.19up 0+07:36:07  00:24:
> 00
> 50 processes:  1 running, 49 sleeping
> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 5988K Active, 14M Inact, 2641M Wired, 41K Buf, 4179M Free
> ARC: 790M Total, 575M MFU, 169M MRU, 16K Anon, 9618K Header, 36M Other
> Swap: 40G Total, 50M Used, 40G Free
> 
> Do you use tmpfs?  Anything stored in there will get stashed in inactive
> memory and/or swap.

Tmpfs objects are treated as any other in memory. If the pages are recent 
enough they will be active.


-- 
Cheers,
Cy Schubert  or 
FreeBSD UNIX:     Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-04-01 Thread O. Hartmann
Am Fri, 1 Apr 2016 13:37:14 -0700 (PDT)
Don Lewis  schrieb:

> On  1 Apr, O. Hartmann wrote:
> > Am Wed, 30 Mar 2016 00:28:39 -0700 (PDT)
> > Don Lewis  schrieb:  
> 
> >> Do you use tmpfs?  Anything stored in there will get stashed in inactive
> >> memory and/or swap.  
> > [...]
> > 
> > Yes, /var/run  and /tmp are on tmpfs   
> 
> There is probably not enough stashed in /var/run to make a difference,
> but if there is a lot in /tmp, it could bloat the amount of inactive
> memory and cause other things to be pushed to swap.  What does
>   df /tmp
> say?
> 
> Depending on the workload, you might find that the change committed in
> r280963 helps interactivity.

The problems are still present with r297495 and all systems with CURRENT show 
symptoms of
not being very responsive under load, even on the consoles/ssh connections on 
non-X11
systems!

> 
> I still can't explain the ssh connection timeout problems that you are
> seeing.  What does
>   ps lax
> report as the MWCHAN for the stuck ssh and/or sshd processes?
> 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



pgpyrfvUSQo0P.pgp
Description: OpenPGP digital signature


Re: CURRENT slow and shaky network stability

2016-04-01 Thread Don Lewis
On  1 Apr, O. Hartmann wrote:
> Am Wed, 30 Mar 2016 00:28:39 -0700 (PDT)
> Don Lewis  schrieb:

>> Do you use tmpfs?  Anything stored in there will get stashed in inactive
>> memory and/or swap.
> [...]
> 
> Yes, /var/run  and /tmp are on tmpfs 

There is probably not enough stashed in /var/run to make a difference,
but if there is a lot in /tmp, it could bloat the amount of inactive
memory and cause other things to be pushed to swap.  What does
df /tmp
say?

Depending on the workload, you might find that the change committed in
r280963 helps interactivity.

I still can't explain the ssh connection timeout problems that you are
seeing.  What does
ps lax
report as the MWCHAN for the stuck ssh and/or sshd processes?

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-04-01 Thread O. Hartmann
Am Wed, 30 Mar 2016 00:28:39 -0700 (PDT)
Don Lewis  schrieb:

> On 29 Mar, To: ohart...@zedat.fu-berlin.de wrote:
> > On 28 Mar, Don Lewis wrote:  
> >> On 28 Mar, O. Hartmann wrote:  
> >
> >> If I get a chance, I try booting my FreeBSD 11 machine with less RAM to
> >> see if that is a trigger.  
> > 
> > I just tried cranking hw.physmen down to 8 GB on 11.0-CURRENT r297204,
> > GENERIC kernel.  /boot/loader.conf contains:
> >   geom_mirror_load="YES"
> >   kern.geom.label.disk_ident.enable="0"
> >   kern.geom.label.gptid.enable="0"
> >   zfs_load="YES"
> >   vboxdrv_load="YES"
> >   hw.physmem="8G"
> > 
> > /etc/sysctl.conf contains:
> >   kern.ipc.shm_allow_removed=1
> > 
> > No /etc/src.conf and nothing of that should matter in /etc/make.conf.
> > 
> > 
> > This is what I see after running
> > poudriere ports -p whatever -u
> > 
> > last pid:  2102;  load averages:  0.24,  0.52,  0.36up 0+00:06:54  
> > 14:13:51
> > 52 processes:  1 running, 51 sleeping
> > CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > Mem: 95M Active, 20M Inact, 1145M Wired, 39K Buf, 5580M Free
> > ARC: 595M Total, 256M MFU, 248M MRU, 16K Anon, 14M Header, 78M Other
> > Swap: 40G Total, 40G Free
> > 
> > No swap used, inactive memory low, no interactivity problems.  Next I'll
> > try r297267, which is what I believe you are running.  I scanned the
> > commit logs between r297204 and r297267 and didn't see anything terribly
> > suspicious looking.  
> 
> No problems here with r297267 either.  I did a bunch of small poudriere
> runs since the system was first booted.  Usable RAM is still dialed back
> to 8 GB.  A bit of swap is in use, mostly because nginx, which has been
> unused since the system was booted, got swapped out.  Inactive memory is
> low now that poudriere is done.
> 
> last pid: 75471;  load averages:  0.21,  0.15,  0.19up 0+07:36:07  
> 00:24:00
> 50 processes:  1 running, 49 sleeping
> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 5988K Active, 14M Inact, 2641M Wired, 41K Buf, 4179M Free
> ARC: 790M Total, 575M MFU, 169M MRU, 16K Anon, 9618K Header, 36M Other
> Swap: 40G Total, 50M Used, 40G Free
> 
> Do you use tmpfs?  Anything stored in there will get stashed in inactive
> memory and/or swap.
[...]

Yes, /var/run  and /tmp are on tmpfs 


pgpHDiPWbFV3p.pgp
Description: OpenPGP digital signature


Re: CURRENT slow and shaky network stability

2016-03-30 Thread Don Lewis
On 29 Mar, To: ohart...@zedat.fu-berlin.de wrote:
> On 28 Mar, Don Lewis wrote:
>> On 28 Mar, O. Hartmann wrote:
>  
>> If I get a chance, I try booting my FreeBSD 11 machine with less RAM to
>> see if that is a trigger.
> 
> I just tried cranking hw.physmen down to 8 GB on 11.0-CURRENT r297204,
> GENERIC kernel.  /boot/loader.conf contains:
>   geom_mirror_load="YES"
>   kern.geom.label.disk_ident.enable="0"
>   kern.geom.label.gptid.enable="0"
>   zfs_load="YES"
>   vboxdrv_load="YES"
>   hw.physmem="8G"
> 
> /etc/sysctl.conf contains:
>   kern.ipc.shm_allow_removed=1
> 
> No /etc/src.conf and nothing of that should matter in /etc/make.conf.
> 
> 
> This is what I see after running
>   poudriere ports -p whatever -u
> 
> last pid:  2102;  load averages:  0.24,  0.52,  0.36up 0+00:06:54  
> 14:13:51
> 52 processes:  1 running, 51 sleeping
> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 95M Active, 20M Inact, 1145M Wired, 39K Buf, 5580M Free
> ARC: 595M Total, 256M MFU, 248M MRU, 16K Anon, 14M Header, 78M Other
> Swap: 40G Total, 40G Free
> 
> No swap used, inactive memory low, no interactivity problems.  Next I'll
> try r297267, which is what I believe you are running.  I scanned the
> commit logs between r297204 and r297267 and didn't see anything terribly
> suspicious looking.

No problems here with r297267 either.  I did a bunch of small poudriere
runs since the system was first booted.  Usable RAM is still dialed back
to 8 GB.  A bit of swap is in use, mostly because nginx, which has been
unused since the system was booted, got swapped out.  Inactive memory is
low now that poudriere is done.

last pid: 75471;  load averages:  0.21,  0.15,  0.19up 0+07:36:07  00:24:00
50 processes:  1 running, 49 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 5988K Active, 14M Inact, 2641M Wired, 41K Buf, 4179M Free
ARC: 790M Total, 575M MFU, 169M MRU, 16K Anon, 9618K Header, 36M Other
Swap: 40G Total, 50M Used, 40G Free

Do you use tmpfs?  Anything stored in there will get stashed in inactive
memory and/or swap.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-29 Thread Don Lewis
On 28 Mar, Don Lewis wrote:
> On 28 Mar, O. Hartmann wrote:
 
> If I get a chance, I try booting my FreeBSD 11 machine with less RAM to
> see if that is a trigger.

I just tried cranking hw.physmen down to 8 GB on 11.0-CURRENT r297204,
GENERIC kernel.  /boot/loader.conf contains:
  geom_mirror_load="YES"
  kern.geom.label.disk_ident.enable="0"
  kern.geom.label.gptid.enable="0"
  zfs_load="YES"
  vboxdrv_load="YES"
  hw.physmem="8G"

/etc/sysctl.conf contains:
  kern.ipc.shm_allow_removed=1

No /etc/src.conf and nothing of that should matter in /etc/make.conf.


This is what I see after running
poudriere ports -p whatever -u

last pid:  2102;  load averages:  0.24,  0.52,  0.36up 0+00:06:54  14:13:51
52 processes:  1 running, 51 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 95M Active, 20M Inact, 1145M Wired, 39K Buf, 5580M Free
ARC: 595M Total, 256M MFU, 248M MRU, 16K Anon, 14M Header, 78M Other
Swap: 40G Total, 40G Free

No swap used, inactive memory low, no interactivity problems.  Next I'll
try r297267, which is what I believe you are running.  I scanned the
commit logs between r297204 and r297267 and didn't see anything terribly
suspicious looking.


Are you doing any stateful firewalling?  If the firewall connection
state times out more quickly than the ssh keepalive interval, that would
break connections.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-29 Thread O. Hartmann
On Mon, 28 Mar 2016 14:52:09 -0700 (PDT)
Don Lewis  wrote:

> On 28 Mar, O. Hartmann wrote:
> > Am Sat, 26 Mar 2016 14:26:45 -0700 (PDT)
> > Don Lewis  schrieb:
> >   
> >> On 26 Mar, Michael Butler wrote:  
> >> > -current is not great for interactive use at all. The strategy of
> >> > pre-emptively dropping idle processes to swap is hurting .. big time.
> >> > 
> >> > Compare inactive memory to swap in this example ..
> >> > 
> >> > 110 processes: 1 running, 108 sleeping, 1 zombie
> >> > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> >> > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> >> > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> >> > 
> >> >   PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU
> >> > COMMAND
> >> >  1819 imb  1  280   213M 11284K select  1 147:44   5.97%
> >> > gkrellm
> >> > 59238 imb 43  200   980M   424M select  0  10:07   1.92%
> >> > firefox
> >> > 
> >> >  .. it shouldn't start randomly swapping out processes because they're
> >> > used infrequently when there's more than enough RAM to spare ..
> >> 
> >> I don't know what changed, and probably something can use some tweaking,
> >> but paging out idle processes isn't always the wrong thing to do.  For
> >> instance if I'm using poudriere to build a bunch of packages and its
> >> heavy use of tmpfs is pushing the machine into many GB of swap usage, I
> >> don't want interactive use like:
> >>vi foo.c
> >>cc foo.c
> >>vi foo.c
> >> to suffer because vi and cc have to be read in from a busy hard drive
> >> each time while unused console getty and idle sshd processes in a bunch
> >> of jails are still hanging on to memory even though they haven't
> >> executed any instructions since shortly after the machine was booted
> >> weeks ago.
> >>   
> >> > It also shows up when trying to reboot .. on all of my gear, 90 seconds
> >> > of "fail-safe" time-out is no longer enough when a good proportion of
> >> > daemons have been dropped onto swap and must be brought back in to flush
> >> > their data segments :-(
> >> 
> >> That's a different and known problem.  See:
> >> 
> >>   
> > 
> > CURRENT has rendered unusable and faulty. Updating ports for poudriere ends
> > up in this error/broken pipe from remote console:
> > 
> >  [~] poudriere ports -u -p head
> > [00:00:00] >> Updating portstree "head"
> > [00:00:00] >> Updating the ports tree... done
> > root@gate [~] Fssh_packet_write_wait: Connection to 192.168.250.111 port
> > 22: Broken pipe
> > 
> > 
> > Although not under load, several processes over time gets idled/paged out -
> > and they never recover, the connection is then sabott, the whole thing
> > unusable :-(  
> 
> I'm definitely not seeing that here.  This is getting close to the end
> of a big poudriere run:
> 
> last pid: 82549;  load averages: 20.05, 20.72, 23.51up 5+12:34:14
> 12:51:55 144 processes: 20 running, 109 sleeping, 15 stopped
> CPU: 85.3% user,  0.0% nice, 14.7% system,  0.0% interrupt,  0.0% idle
> Mem: 1082M Active, 19G Inact, 9718M Wired, 249M Buf, 1095M Free
> ARC: 3841M Total, 2039M MFU, 642M MRU, 3395K Anon, 111M Header, 1044M Other
> Swap: 40G Total, 9691M Used, 31G Free, 23% Inuse, 196K In
> 
> At the moment, openoffice-4, openoffice-devel, libreoffice, and chromium
> are all being built and are using tmpfs for "wrkdir data localbase", so
> there are many GB of data in tmpfs, which is the reason for the high
> inact and swap usage.  I just hit the return key in an idle (for a
> couple of hours) terminal window containing an ssh login session to the
> same machine.  I got a fresh command prompt essentially instantaneously.
> It couldn't have taken more than a couple hundred milliseconds to wake
> up and page in the idle sshd and shell processes on the build server.
> 
> [a couple hours later, after poudriere is done and all tmpfs is gone]
> 
> last pid: 66089;  load averages:  0.13,  1.59,  4.61up 5+14:14:33
> 14:32:14 71 processes:  1 running, 55 sleeping, 15 stopped
> CPU:  3.1% user,  0.0% nice,  0.0% system,  0.0% interrupt, 96.9% idle
> Mem: 58M Active, 85M Inact, 12G Wired, 249M Buf, 19G Free
> ARC: 6249M Total, 2792M MFU, 2246M MRU, 16K Anon, 133M Header, 1078M Other
> Swap: 40G Total, 81M Used, 40G Free
> 
> [after tracking down and exiting all of those stopped processes]
> 
> last pid: 66103;  load averages:  0.20,  0.99,  3.80up 5+14:17:18
> 14:34:59 56 processes:  1 running, 55 sleeping
> CPU:  0.0% user,  0.0% nice,  0.1% system,  0.1% interrupt, 99.9% idle
> Mem: 57M Active, 88M Inact, 12G Wired, 249M Buf, 19G Free
> ARC: 6251M Total, 2793M MFU, 2247M MRU, 16K Anon, 133M Header, 1078M Other
> Swap: 40G Total, 63M Used, 40G Free
> 
> The biggest chunk of the 63 MB of swap appears to be nginx.  It's
> process size is 29 

Re: CURRENT slow and shaky network stability

2016-03-28 Thread Don Lewis
On 28 Mar, O. Hartmann wrote:
> Am Sat, 26 Mar 2016 14:26:45 -0700 (PDT)
> Don Lewis  schrieb:
> 
>> On 26 Mar, Michael Butler wrote:
>> > -current is not great for interactive use at all. The strategy of
>> > pre-emptively dropping idle processes to swap is hurting .. big time.
>> > 
>> > Compare inactive memory to swap in this example ..
>> > 
>> > 110 processes: 1 running, 108 sleeping, 1 zombie
>> > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
>> > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
>> > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
>> > 
>> >   PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU
>> > COMMAND
>> >  1819 imb  1  280   213M 11284K select  1 147:44   5.97%
>> > gkrellm
>> > 59238 imb 43  200   980M   424M select  0  10:07   1.92%
>> > firefox
>> > 
>> >  .. it shouldn't start randomly swapping out processes because they're
>> > used infrequently when there's more than enough RAM to spare ..  
>> 
>> I don't know what changed, and probably something can use some tweaking,
>> but paging out idle processes isn't always the wrong thing to do.  For
>> instance if I'm using poudriere to build a bunch of packages and its
>> heavy use of tmpfs is pushing the machine into many GB of swap usage, I
>> don't want interactive use like:
>>  vi foo.c
>>  cc foo.c
>>  vi foo.c
>> to suffer because vi and cc have to be read in from a busy hard drive
>> each time while unused console getty and idle sshd processes in a bunch
>> of jails are still hanging on to memory even though they haven't
>> executed any instructions since shortly after the machine was booted
>> weeks ago.
>> 
>> > It also shows up when trying to reboot .. on all of my gear, 90 seconds
>> > of "fail-safe" time-out is no longer enough when a good proportion of
>> > daemons have been dropped onto swap and must be brought back in to flush
>> > their data segments :-(  
>> 
>> That's a different and known problem.  See:
>> 
> 
> CURRENT has rendered unusable and faulty. Updating ports for poudriere ends 
> up in this
> error/broken pipe from remote console:
> 
>  [~] poudriere ports -u -p head
> [00:00:00] >> Updating portstree "head"
> [00:00:00] >> Updating the ports tree... done
> root@gate [~] Fssh_packet_write_wait: Connection to 192.168.250.111 port 22: 
> Broken pipe
> 
> 
> Although not under load, several processes over time gets idled/paged out - 
> and they
> never recover, the connection is then sabott, the whole thing unusable :-(

I'm definitely not seeing that here.  This is getting close to the end
of a big poudriere run:

last pid: 82549;  load averages: 20.05, 20.72, 23.51up 5+12:34:14  12:51:55
144 processes: 20 running, 109 sleeping, 15 stopped
CPU: 85.3% user,  0.0% nice, 14.7% system,  0.0% interrupt,  0.0% idle
Mem: 1082M Active, 19G Inact, 9718M Wired, 249M Buf, 1095M Free
ARC: 3841M Total, 2039M MFU, 642M MRU, 3395K Anon, 111M Header, 1044M Other
Swap: 40G Total, 9691M Used, 31G Free, 23% Inuse, 196K In

At the moment, openoffice-4, openoffice-devel, libreoffice, and chromium
are all being built and are using tmpfs for "wrkdir data localbase", so
there are many GB of data in tmpfs, which is the reason for the high
inact and swap usage.  I just hit the return key in an idle (for a
couple of hours) terminal window containing an ssh login session to the
same machine.  I got a fresh command prompt essentially instantaneously.
It couldn't have taken more than a couple hundred milliseconds to wake
up and page in the idle sshd and shell processes on the build server.

[a couple hours later, after poudriere is done and all tmpfs is gone]

last pid: 66089;  load averages:  0.13,  1.59,  4.61up 5+14:14:33  14:32:14
71 processes:  1 running, 55 sleeping, 15 stopped
CPU:  3.1% user,  0.0% nice,  0.0% system,  0.0% interrupt, 96.9% idle
Mem: 58M Active, 85M Inact, 12G Wired, 249M Buf, 19G Free
ARC: 6249M Total, 2792M MFU, 2246M MRU, 16K Anon, 133M Header, 1078M Other
Swap: 40G Total, 81M Used, 40G Free

[after tracking down and exiting all of those stopped processes]

last pid: 66103;  load averages:  0.20,  0.99,  3.80up 5+14:17:18  14:34:59
56 processes:  1 running, 55 sleeping
CPU:  0.0% user,  0.0% nice,  0.1% system,  0.1% interrupt, 99.9% idle
Mem: 57M Active, 88M Inact, 12G Wired, 249M Buf, 19G Free
ARC: 6251M Total, 2793M MFU, 2247M MRU, 16K Anon, 133M Header, 1078M Other
Swap: 40G Total, 63M Used, 40G Free

The biggest chunk of the 63 MB of swap appears to be nginx.  It's
process size is 29 MB, but it has zero resident.  It hasn't executed any
code since it was first started when I booted the system several days
ago.  Other consumers appear to be getty and sshd and syslogd in various
untouched jails.


I've seen reports that r296137 and r297267 show the ssh problem, but

Re: CURRENT slow and shaky network stability

2016-03-28 Thread O. Hartmann
Am Sat, 26 Mar 2016 14:26:45 -0700 (PDT)
Don Lewis  schrieb:

> On 26 Mar, Michael Butler wrote:
> > -current is not great for interactive use at all. The strategy of
> > pre-emptively dropping idle processes to swap is hurting .. big time.
> > 
> > Compare inactive memory to swap in this example ..
> > 
> > 110 processes: 1 running, 108 sleeping, 1 zombie
> > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> > 
> >   PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU
> > COMMAND
> >  1819 imb  1  280   213M 11284K select  1 147:44   5.97%
> > gkrellm
> > 59238 imb 43  200   980M   424M select  0  10:07   1.92%
> > firefox
> > 
> >  .. it shouldn't start randomly swapping out processes because they're
> > used infrequently when there's more than enough RAM to spare ..  
> 
> I don't know what changed, and probably something can use some tweaking,
> but paging out idle processes isn't always the wrong thing to do.  For
> instance if I'm using poudriere to build a bunch of packages and its
> heavy use of tmpfs is pushing the machine into many GB of swap usage, I
> don't want interactive use like:
>   vi foo.c
>   cc foo.c
>   vi foo.c
> to suffer because vi and cc have to be read in from a busy hard drive
> each time while unused console getty and idle sshd processes in a bunch
> of jails are still hanging on to memory even though they haven't
> executed any instructions since shortly after the machine was booted
> weeks ago.
> 
> > It also shows up when trying to reboot .. on all of my gear, 90 seconds
> > of "fail-safe" time-out is no longer enough when a good proportion of
> > daemons have been dropped onto swap and must be brought back in to flush
> > their data segments :-(  
> 
> That's a different and known problem.  See:
> 

CURRENT has rendered unusable and faulty. Updating ports for poudriere ends up 
in this
error/broken pipe from remote console:

 [~] poudriere ports -u -p head
[00:00:00] >> Updating portstree "head"
[00:00:00] >> Updating the ports tree... done
root@gate [~] Fssh_packet_write_wait: Connection to 192.168.250.111 port 22: 
Broken pipe


Although not under load, several processes over time gets idled/paged out - and 
they
never recover, the connection is then sabott, the whole thing unusable :-(


pgpRYniZa7ap3.pgp
Description: OpenPGP digital signature


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Poul-Henning Kamp

In message <201603262331.u2qnvxvm080...@gw.catspoiler.org>, Don Lewis writes:

>> I am not running zfs.

Ohh, and I should probably add:  I don't have a swap-space configured.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Don Lewis
On 26 Mar, Poul-Henning Kamp wrote:
> 
> In message 
> 
> , Ultima writes:
> 
>> A large zfs send [...]
> 
> I am not running zfs.

I am.  I'm not seeing any unexpected problems.  I haven't really loaded
this system up with a big poudriere run since it's been booted.  It's
mostly just been singled-threaded compiles since then.  A bit of swap is
used and free memory is kind of low.  In my case the culprit seems to be
ARC.

last pid: 28589;  load averages:  1.06,  1.09,  1.08up 3+15:55:10  16:12:51
71 processes:  2 running, 63 sleeping, 6 stopped
CPU:  7.0% user,  0.0% nice,  0.9% system,  0.0% interrupt, 92.1% idle
Mem: 564M Active, 6787M Inact, 23G Wired, 277M Buf, 656M Free
ARC: 16G Total, 5494M MFU, 2062M MRU, 499K Anon, 573M Header, 8434M Other
Swap: 40G Total, 4108K Used, 40G Free

When I'm making heavy use of poudriere, swap usage goes up a lot and the
pressure on free memory decreases the ARC size significantly.

It's a headless machine accessed via ssh, so I won't see any issues with
Xorgs and its clients getting paged out.  Remote shell access seems to
perform ok for me.

Before r297203 I was running r296416 and really pushed it hard.  I
didn't observe any unexpected interactivity issues even with more than
10G of swap used and a load average over 50.  I'd love to add more RAM,
but the motherboard is at max capacity.

The problems that some are describing sound a lot like lost interrupts
or lost wakeups.  The former could easily be hardware dependant.

My FreeBSD 10.3-PRERELEASE desktop is worse under load.  It also uses
zfs, but only has 8 GB of RAM.  The main culprit is firefox, which gets
very bloated after a while.  I've got some other hoggish processes on
it, which doesn't help.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Poul-Henning Kamp

In message 
, Ultima writes:

> A large zfs send [...]

I am not running zfs.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Ultima
 Having this long timeout issue occur many times during the day.  Normally
not this bad, currently on r297060 amd64. A few hours ago had a system hang
that lasted for about 1-2 hours.(not sure how long exactly, gave up
waiting) It occured after a zfs destroy operation and affected sshd. (could
no longer login via ssh) The system was in a seemingly unusable state
during this time.

 A large zfs send was underway during the outage. This operation was not in
the same state as the system receiving was still growing.

Not sure if its related, sounds like it is. Any more information that maybe
helpful?

On Sat, Mar 26, 2016 at 5:26 PM, Don Lewis  wrote:

> On 26 Mar, Michael Butler wrote:
> > -current is not great for interactive use at all. The strategy of
> > pre-emptively dropping idle processes to swap is hurting .. big time.
> >
> > Compare inactive memory to swap in this example ..
> >
> > 110 processes: 1 running, 108 sleeping, 1 zombie
> > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> >
> >   PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU
> > COMMAND
> >  1819 imb  1  280   213M 11284K select  1 147:44   5.97%
> > gkrellm
> > 59238 imb 43  200   980M   424M select  0  10:07   1.92%
> > firefox
> >
> >  .. it shouldn't start randomly swapping out processes because they're
> > used infrequently when there's more than enough RAM to spare ..
>
> I don't know what changed, and probably something can use some tweaking,
> but paging out idle processes isn't always the wrong thing to do.  For
> instance if I'm using poudriere to build a bunch of packages and its
> heavy use of tmpfs is pushing the machine into many GB of swap usage, I
> don't want interactive use like:
> vi foo.c
> cc foo.c
> vi foo.c
> to suffer because vi and cc have to be read in from a busy hard drive
> each time while unused console getty and idle sshd processes in a bunch
> of jails are still hanging on to memory even though they haven't
> executed any instructions since shortly after the machine was booted
> weeks ago.
>
> > It also shows up when trying to reboot .. on all of my gear, 90 seconds
> > of "fail-safe" time-out is no longer enough when a good proportion of
> > daemons have been dropped onto swap and must be brought back in to flush
> > their data segments :-(
>
> That's a different and known problem.  See:
> <
> https://svnweb.freebsd.org/base/releng/10.3/bin/csh/config_p.h?revision=297204=markup
> >
>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Don Lewis
On 26 Mar, Michael Butler wrote:
> -current is not great for interactive use at all. The strategy of
> pre-emptively dropping idle processes to swap is hurting .. big time.
> 
> Compare inactive memory to swap in this example ..
> 
> 110 processes: 1 running, 108 sleeping, 1 zombie
> CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> 
>   PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU
> COMMAND
>  1819 imb  1  280   213M 11284K select  1 147:44   5.97%
> gkrellm
> 59238 imb 43  200   980M   424M select  0  10:07   1.92%
> firefox
> 
>  .. it shouldn't start randomly swapping out processes because they're
> used infrequently when there's more than enough RAM to spare ..

I don't know what changed, and probably something can use some tweaking,
but paging out idle processes isn't always the wrong thing to do.  For
instance if I'm using poudriere to build a bunch of packages and its
heavy use of tmpfs is pushing the machine into many GB of swap usage, I
don't want interactive use like:
vi foo.c
cc foo.c
vi foo.c
to suffer because vi and cc have to be read in from a busy hard drive
each time while unused console getty and idle sshd processes in a bunch
of jails are still hanging on to memory even though they haven't
executed any instructions since shortly after the machine was booted
weeks ago.

> It also shows up when trying to reboot .. on all of my gear, 90 seconds
> of "fail-safe" time-out is no longer enough when a good proportion of
> daemons have been dropped onto swap and must be brought back in to flush
> their data segments :-(

That's a different and known problem.  See:


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Don Lewis
On 26 Mar, Poul-Henning Kamp wrote:
> 
> In message 
> 
> , Adrian Chadd writes:
> 
> I can second that -current isn't too great right now, and I also see
> the breaking SSH sessions.
> 
> I'm running:
> 
>   FreeBSD 11.0-CURRENT #32 r296137: Sat Feb 27 11:34:01 UTC 2016
> 
> I repartitioned my disk I don't have the previous version around any more
> but it was 4-6 weeks old, and I'm quite sure it didn't have the breaking
> ssh connections.
> 
> On another machine running:
> 
>   FreeBSD 11.0-CURRENT #4 r296808: Sun Mar 13 22:39:59 UTC 2016
> 
> I'm seeing weirdness with a 2TB and a 3TB external USB disk, in
> particular I/O aborts with this message:
> 
>   usb_pc_common_mem_cb: Page offset was not preserved
> 

No problems here with:
FreeBSD 11.0-CURRENT #37 r297204M: Tue Mar 22 23:29:06 PDT 2016

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Poul-Henning Kamp

In message 
, Adrian Chadd writes:

I can second that -current isn't too great right now, and I also see
the breaking SSH sessions.

I'm running:

FreeBSD 11.0-CURRENT #32 r296137: Sat Feb 27 11:34:01 UTC 2016

I repartitioned my disk I don't have the previous version around any more
but it was 4-6 weeks old, and I'm quite sure it didn't have the breaking
ssh connections.

On another machine running:

FreeBSD 11.0-CURRENT #4 r296808: Sun Mar 13 22:39:59 UTC 2016

I'm seeing weirdness with a 2TB and a 3TB external USB disk, in
particular I/O aborts with this message:

usb_pc_common_mem_cb: Page offset was not preserved

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Adrian Chadd
hiya,

can you identify a revision where it /doesn't/ do broken pipe? That'd
be the best way to start debugging this and figure out which revision
broke things.

I haven't updated to the latest -HEAD on anything just yet;
everything's a few weeks old.

Thanks,


-a


On 25 March 2016 at 13:30, O. Hartmann  wrote:
> Since a couple of days now, FreeBSD CURRENT (at the moment with FreeBSD 
> 11.0-CURRENT #9
> r297267: Fri Mar 25 09:48:07 CET 2016 amd64) "feels" a kind of shaky and like 
> "glue": it
> is slow with X11, sometimes ssh connections even nearby hosts on the same net 
> not under
> load have some time to respond to keys in xterm or on console (vt()) ~ 1 - 3 
> seconds and
> I receive very often "broken pipe" to ssh connections to a host nearby. I 
> realized this
> strange behaviour on a couple of systems a maintain running most recent 
> CURRENT.
>
> I also realize a high usage of swap on a 8GB RAM, 2 core box having two ZFS 
> volumes (one
> 3TB HD and one 4 TB HAD with ZFS). Using Firefox on X11 (nVidia 364.12/355.11 
> driver, I
> checked on both) and running desktop only (windowmaker) brings the system 
> toward using 12
> or sometimes several hundreds of megabytes of swap - and I do not see what is 
> using so
> much space.
>
> Does anyone also realize  this phenomenon?
>
> Regards,
>
> oh
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread O. Hartmann
Am Sat, 26 Mar 2016 13:28:16 -0400
Michael Butler  schrieb:

> -current is not great for interactive use at all. The strategy of
> pre-emptively dropping idle processes to swap is hurting .. big time.

What is the gain then?

If this "feature" results in corrupted ssh sessions, slow console sessions or 
even worse:
prolongued compilation times, then it is a big fail!

> 
> Compare inactive memory to swap in this example ..
> 
> 110 processes: 1 running, 108 sleeping, 1 zombie
> CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> 
>   PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU
> COMMAND
>  1819 imb  1  280   213M 11284K select  1 147:44   5.97%
> gkrellm
> 59238 imb 43  200   980M   424M select  0  10:07   1.92%
> firefox
> 
>  .. it shouldn't start randomly swapping out processes because they're
> used infrequently when there's more than enough RAM to spare ..
> 
> It also shows up when trying to reboot .. on all of my gear, 90 seconds
> of "fail-safe" time-out is no longer enough when a good proportion of
> daemons have been dropped onto swap and must be brought back in to flush
> their data segments :-(
> 
>   Michael



pgp1XqJWKOaiU.pgp
Description: OpenPGP digital signature


Re: CURRENT slow and shaky network stability

2016-03-26 Thread K. Macy
Sorry meant inpcb and autocorrect "fixed" it.

On Saturday, March 26, 2016, O. Hartmann 
wrote:

> Am Fri, 25 Mar 2016 13:31:31 -0700
> "K. Macy" > schrieb:
>
> > Does this pre or postage input changes?
>
> ???
>
> First of all, and the most visible fact is, that the ssh connection with
> high terminal
> i/o (compiling world, poudriere bulk ...) receives very often broken pipe.
> The "native"
> console of the systems (non UEFI, but drm2/i915kms loaded, iGPU of
> IvyBridge XEON) in
> question is like "glue" - responding time shifted. This is on all CURRENT
> systems.
>
> >
> > On Friday, March 25, 2016, O. Hartmann  > wrote:
> >
> > > Since a couple of days now, FreeBSD CURRENT (at the moment with FreeBSD
> > > 11.0-CURRENT #9
> > > r297267: Fri Mar 25 09:48:07 CET 2016 amd64) "feels" a kind of shaky
> and
> > > like "glue": it
> > > is slow with X11, sometimes ssh connections even nearby hosts on the
> same
> > > net not under
> > > load have some time to respond to keys in xterm or on console (vt()) ~
> 1 -
> > > 3 seconds and
> > > I receive very often "broken pipe" to ssh connections to a host
> nearby. I
> > > realized this
> > > strange behaviour on a couple of systems a maintain running most recent
> > > CURRENT.
> > >
> > > I also realize a high usage of swap on a 8GB RAM, 2 core box having two
> > > ZFS volumes (one
> > > 3TB HD and one 4 TB HAD with ZFS). Using Firefox on X11 (nVidia
> > > 364.12/355.11 driver, I
> > > checked on both) and running desktop only (windowmaker) brings the
> system
> > > toward using 12
> > > or sometimes several hundreds of megabytes of swap - and I do not see
> what
> > > is using so
> > > much space.
> > >
> > > Does anyone also realize  this phenomenon?
> > >
> > > Regards,
> > >
> > > oh
> > > ___
> > > freebsd-current@freebsd.org   mailing list
> > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > To unsubscribe, send any mail to "
> freebsd-current-unsubscr...@freebsd.org 
> > > "
> > >
> > ___
> > freebsd-current@freebsd.org  mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "
> freebsd-current-unsubscr...@freebsd.org "
>
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread Michael Butler
-current is not great for interactive use at all. The strategy of
pre-emptively dropping idle processes to swap is hurting .. big time.

Compare inactive memory to swap in this example ..

110 processes: 1 running, 108 sleeping, 1 zombie
CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse

  PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU
COMMAND
 1819 imb  1  280   213M 11284K select  1 147:44   5.97%
gkrellm
59238 imb 43  200   980M   424M select  0  10:07   1.92%
firefox

 .. it shouldn't start randomly swapping out processes because they're
used infrequently when there's more than enough RAM to spare ..

It also shows up when trying to reboot .. on all of my gear, 90 seconds
of "fail-safe" time-out is no longer enough when a good proportion of
daemons have been dropped onto swap and must be brought back in to flush
their data segments :-(

Michael

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT slow and shaky network stability

2016-03-26 Thread O. Hartmann
Am Fri, 25 Mar 2016 13:31:31 -0700
"K. Macy"  schrieb:

> Does this pre or postage input changes?

???

First of all, and the most visible fact is, that the ssh connection with high 
terminal
i/o (compiling world, poudriere bulk ...) receives very often broken pipe. The 
"native"
console of the systems (non UEFI, but drm2/i915kms loaded, iGPU of IvyBridge 
XEON) in
question is like "glue" - responding time shifted. This is on all CURRENT 
systems.
 
> 
> On Friday, March 25, 2016, O. Hartmann  wrote:
> 
> > Since a couple of days now, FreeBSD CURRENT (at the moment with FreeBSD
> > 11.0-CURRENT #9
> > r297267: Fri Mar 25 09:48:07 CET 2016 amd64) "feels" a kind of shaky and
> > like "glue": it
> > is slow with X11, sometimes ssh connections even nearby hosts on the same
> > net not under
> > load have some time to respond to keys in xterm or on console (vt()) ~ 1 -
> > 3 seconds and
> > I receive very often "broken pipe" to ssh connections to a host nearby. I
> > realized this
> > strange behaviour on a couple of systems a maintain running most recent
> > CURRENT.
> >
> > I also realize a high usage of swap on a 8GB RAM, 2 core box having two
> > ZFS volumes (one
> > 3TB HD and one 4 TB HAD with ZFS). Using Firefox on X11 (nVidia
> > 364.12/355.11 driver, I
> > checked on both) and running desktop only (windowmaker) brings the system
> > toward using 12
> > or sometimes several hundreds of megabytes of swap - and I do not see what
> > is using so
> > much space.
> >
> > Does anyone also realize  this phenomenon?
> >
> > Regards,
> >
> > oh
> > ___
> > freebsd-current@freebsd.org  mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org
> > "
> >  
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



pgpsA2vWYYzTu.pgp
Description: OpenPGP digital signature


Re: CURRENT slow and shaky network stability

2016-03-25 Thread K. Macy
Does this pre or postage input changes?

On Friday, March 25, 2016, O. Hartmann  wrote:

> Since a couple of days now, FreeBSD CURRENT (at the moment with FreeBSD
> 11.0-CURRENT #9
> r297267: Fri Mar 25 09:48:07 CET 2016 amd64) "feels" a kind of shaky and
> like "glue": it
> is slow with X11, sometimes ssh connections even nearby hosts on the same
> net not under
> load have some time to respond to keys in xterm or on console (vt()) ~ 1 -
> 3 seconds and
> I receive very often "broken pipe" to ssh connections to a host nearby. I
> realized this
> strange behaviour on a couple of systems a maintain running most recent
> CURRENT.
>
> I also realize a high usage of swap on a 8GB RAM, 2 core box having two
> ZFS volumes (one
> 3TB HD and one 4 TB HAD with ZFS). Using Firefox on X11 (nVidia
> 364.12/355.11 driver, I
> checked on both) and running desktop only (windowmaker) brings the system
> toward using 12
> or sometimes several hundreds of megabytes of swap - and I do not see what
> is using so
> much space.
>
> Does anyone also realize  this phenomenon?
>
> Regards,
>
> oh
> ___
> freebsd-current@freebsd.org  mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org
> "
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"