Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping kills connections

2018-05-02 Thread Infoomatic
thats good news, thanks Philip for the info! In the meantime I disabled
swap (as well as ntopng) on my firewalls - this is of course not needed
on a firewall and was just a left-over from the initial default install.
regards,infoomatic Gesendet: Freitag, 27. April 2018 um 13:50 Uhr
Von: "Philip Guenther" 
An: Infoomatic 
Cc: "OpenBSD Misc" 
Betreff: Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping
kills connectionsOn Thu, Apr 26, 2018 at 11:21 PM, Infoomatic 

wrote:

  thanks for your input! Actually, I was never really satisfied with
  the stability of ntopng, so this problem of the memory leak does not
  really surprise me. However, when killing the process, which also
  means freeing swap space, I think it is not an expected behaviour
  that the system does not handle any tcp/ip or icmp connections any
  more until the swap space is fully freed (which, in my case when
  ntopng used 3 out of 4GB swap, lastet for nearly 20 minutes). IMHO,
  unswapping a process should not influence network connectivity that
  much.

You're correct that we don't want the clean up of an exiting process to
affect network processing.  The issue is that our UVM is still under the
kernel lock; work into using more fine-grained locking there has begun
but nothing has really hit the tree yet. Philip Guenther


Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping kills connections

2018-04-27 Thread Philip Guenther
On Thu, Apr 26, 2018 at 11:21 PM, Infoomatic  wrote:
>
> thanks for your input! Actually, I was never really satisfied with the
> stability of ntopng, so this problem of the memory leak does not really
> surprise me. However, when killing the process, which also means freeing
> swap space, I think it is not an expected behaviour that the system does
> not handle any tcp/ip or icmp connections any more until the swap space is
> fully freed (which, in my case when ntopng used 3 out of 4GB swap, lastet
> for nearly 20 minutes). IMHO, unswapping a process should not influence
> network connectivity that much.
>

You're correct that we don't want the clean up of an exiting process to
affect network processing.  The issue is that our UVM is still under the
kernel lock; work into using more fine-grained locking there has begun but
nothing has really hit the tree yet.


Philip Guenther


Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping kills connections

2018-04-27 Thread Stuart Longland
On 27/04/18 07:21, Infoomatic wrote:
> I think it is not an expected behaviour that the system does not handle any 
> tcp/ip or icmp connections any more until the swap space is fully freed 
> (which, in my case when ntopng used 3 out of 4GB swap, lastet for nearly 20 
> minutes). IMHO, unswapping a process should not influence network 
> connectivity that much.

If physical memory is full, and virtual memory is full, where do you
suppose the kernel should buffer incoming network traffic?

-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.



Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping kills connections

2018-04-27 Thread Otto Moerbeek
On Thu, Apr 26, 2018 at 11:21:57PM +0200, Infoomatic wrote:

> Hi Stuart, 

> thanks for your input! Actually, I was never really satisfied with
> the stability of ntopng, so this problem of the memory leak does not
> really surprise me. However, when killing the process, which also
> means freeing swap space, I think it is not an expected behaviour that
> the system does not handle any tcp/ip or icmp connections any more
> until the swap space is fully freed (which, in my case when ntopng
> used 3 out of 4GB swap, lastet for nearly 20 minutes). IMHO,
> unswapping a process should not influence network connectivity that
> much.

Please wrap your lines.

Depening on how you killed it, it's way more than unswapping: the
program likely accesses (all) its data while shutting down, and as a
conseqeunce is swappping in gigs of data, possibly repeatedly.

If the working set is larger han physical memory, all bets are off.

-Otto


> 
> Regards,
> infoomatic
> 
> 
> > Gesendet: Donnerstag, 26. April 2018 um 16:10 Uhr
> > Von: "Stuart Henderson" <s...@spacehopper.org>
> > An: misc@openbsd.org
> > Betreff: Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping 
> > kills connections
> >
> > On 2018-04-26, Infoomatic <infooma...@gmx.at> wrote:
> > > Hi,
> > >
> > > Today I discovered some interesting details: I guess ntopng has a memory 
> > > leak, thus eating all my 4GB RAM and some 3GB swap - this appeared in the 
> > > morning, so after all the backups and heavy traffic occured.
> > > When I fired up a rcctl stop ntopng the ssh connection stalled. The 
> > > firewall could not handle further connections, and established 
> > > connections dropped. The system could not answer to ping packets etc.
> > > This now also happened on a 2nd machine. After 20 minutes (when I was in 
> > > a taxi to the datacenter) I could login again and realized that ntopng 
> > > was stopped and swap was freed.
> > >
> > > I have now disabled ntopng. I kindly ask the devs to take a look at this! 
> > > If you need a testsetup for this or if I can do anything, just contact me.
> > 
> > First off, it's not a big surprise to have a hanging machine if you
> > run it out of memory.
> > 
> > ntopng is not really stable. There is a newer version upstream but it
> > crashes very often with certain packet types suggesting bugs in the packet
> > parsers.
> > 
> > If you run ntopng at all, I would recommend you only run it while you
> > need to investigate traffic, not leave it running unattended permanently.
> > 
> > It might also be a good idea to set login.conf limits for it, if you
> > start it via the rc.d script you can add an "ntopng" class with say
> > datasize=2500M.
> > 
> > 
> > 



Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping kills connections

2018-04-26 Thread Infoomatic
Hi Stuart,

thanks for your input! Actually, I was never really satisfied with the 
stability of ntopng, so this problem of the memory leak does not really 
surprise me. However, when killing the process, which also means freeing swap 
space, I think it is not an expected behaviour that the system does not handle 
any tcp/ip or icmp connections any more until the swap space is fully freed 
(which, in my case when ntopng used 3 out of 4GB swap, lastet for nearly 20 
minutes). IMHO, unswapping a process should not influence network connectivity 
that much.

Regards,
infoomatic


> Gesendet: Donnerstag, 26. April 2018 um 16:10 Uhr
> Von: "Stuart Henderson" <s...@spacehopper.org>
> An: misc@openbsd.org
> Betreff: Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping 
> kills connections
>
> On 2018-04-26, Infoomatic <infooma...@gmx.at> wrote:
> > Hi,
> >
> > Today I discovered some interesting details: I guess ntopng has a memory 
> > leak, thus eating all my 4GB RAM and some 3GB swap - this appeared in the 
> > morning, so after all the backups and heavy traffic occured.
> > When I fired up a rcctl stop ntopng the ssh connection stalled. The 
> > firewall could not handle further connections, and established connections 
> > dropped. The system could not answer to ping packets etc.
> > This now also happened on a 2nd machine. After 20 minutes (when I was in a 
> > taxi to the datacenter) I could login again and realized that ntopng was 
> > stopped and swap was freed.
> >
> > I have now disabled ntopng. I kindly ask the devs to take a look at this! 
> > If you need a testsetup for this or if I can do anything, just contact me.
> 
> First off, it's not a big surprise to have a hanging machine if you
> run it out of memory.
> 
> ntopng is not really stable. There is a newer version upstream but it
> crashes very often with certain packet types suggesting bugs in the packet
> parsers.
> 
> If you run ntopng at all, I would recommend you only run it while you
> need to investigate traffic, not leave it running unattended permanently.
> 
> It might also be a good idea to set login.conf limits for it, if you
> start it via the rc.d script you can add an "ntopng" class with say
> datasize=2500M.
> 
> 
> 



Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping kills connections

2018-04-26 Thread Stuart Henderson
On 2018-04-26, Infoomatic  wrote:
> Hi,
>
> Today I discovered some interesting details: I guess ntopng has a memory 
> leak, thus eating all my 4GB RAM and some 3GB swap - this appeared in the 
> morning, so after all the backups and heavy traffic occured.
> When I fired up a rcctl stop ntopng the ssh connection stalled. The firewall 
> could not handle further connections, and established connections dropped. 
> The system could not answer to ping packets etc.
> This now also happened on a 2nd machine. After 20 minutes (when I was in a 
> taxi to the datacenter) I could login again and realized that ntopng was 
> stopped and swap was freed.
>
> I have now disabled ntopng. I kindly ask the devs to take a look at this! If 
> you need a testsetup for this or if I can do anything, just contact me.

First off, it's not a big surprise to have a hanging machine if you
run it out of memory.

ntopng is not really stable. There is a newer version upstream but it
crashes very often with certain packet types suggesting bugs in the packet
parsers.

If you run ntopng at all, I would recommend you only run it while you
need to investigate traffic, not leave it running unattended permanently.

It might also be a good idea to set login.conf limits for it, if you
start it via the rc.d script you can add an "ntopng" class with say
datasize=2500M.




Re: crash of OpenBSD 6.3 -stable (amd64 MP kernel) - unswapping kills connections

2018-04-26 Thread Infoomatic
Hi,

Today I discovered some interesting details: I guess ntopng has a memory leak, 
thus eating all my 4GB RAM and some 3GB swap - this appeared in the morning, so 
after all the backups and heavy traffic occured.
When I fired up a rcctl stop ntopng the ssh connection stalled. The firewall 
could not handle further connections, and established connections dropped. The 
system could not answer to ping packets etc.
This now also happened on a 2nd machine. After 20 minutes (when I was in a taxi 
to the datacenter) I could login again and realized that ntopng was stopped and 
swap was freed.

I have now disabled ntopng. I kindly ask the devs to take a look at this! If 
you need a testsetup for this or if I can do anything, just contact me.

Regards,
infoomatic



> Gesendet: Mittwoch, 25. April 2018 um 15:25 Uhr
> Von: Infoomatic 
> An: misc@openbsd.org, b...@openbsd.org
> Betreff: crash of OpenBSD 6.3 -stable (amd64 MP kernel)
>
> Hi folks,
> 
> Unfortunately this is not a complete bugreport since I could not retrieve 
> relevant information, however [1] is the dmesg.
> I upgraded to the new OpenBSD 6.3 version on monday, however, today it 
> crashed - better: it hung completely. I could not reach it any more via ssh, 
> a ping needed 15 seconds instead of 19ms, and only some packets arrived at 
> the host - but the network was normal.
> The machine runs the standard services from the default install plus httpd 
> and relayd, and also third party software: OpenVPN, scanlogd and ntopng.
> 
> In the sysctl.conf I have set ddb.panic=0.
> 
> When I was physically standing in front of the machine I was expecting to see 
> some messages on the screen, or even ddb, so to get some info for the devs, 
> but this was not the case.
> I plugged in a PS/2 keyboard with an USB-adapter and promptly got on my 
> screen (without the "date hostname" - took this from the log):
> Apr 25 13:28:21 dorie /bsd: uhub0: device problem, disabling port 1
> 
> I tried another USB port and got:
> Apr 25 13:29:34 dorie /bsd: uhub0: device problem, disabling port 10
> 
> The keyboard was not working on the machine, so I grabbed another one. I 
> plugged it in and suddenly the monitor was filled up with messages which kept 
> flooding and did not stop:
> scsi_xfer pool exhausted!
> 
> I then had to reset the machine. 
> 
> I also found suspicious messages in the log at about the time when the 
> machine got irresponsive:
> Apr 25 11:23:00 dorie relayd[31883]: rsae_send_imsg: poll timeout
> Apr 25 11:23:00 dorie relayd[96425]: rsae_send_imsg: poll timeout
> Apr 25 11:23:11 dorie relayd[39081]: rsae_send_imsg: poll timeout
> Apr 25 11:23:16 dorie relayd[96425]: rsae_send_imsg: poll timeout
> Apr 25 11:23:28 dorie relayd[96425]: relay: proc_dispatch: relay 1 got 
> invalid imsg 59 peerid -1 from ca 1
> Apr 25 11:23:34 dorie relayd[31883]: rsae_send_imsg: poll timeout
> Apr 25 11:23:42 dorie relayd[31883]: relay: pipe closed
> Apr 25 11:23:43 dorie relayd[39081]: rsae_send_imsg: imsg_flush: Broken pipe
> Apr 25 11:23:44 dorie relayd[39081]: relay: pipe closed
> 
> Maybe some devs have an idea where to look for a bug. Any tipps how to deal 
> with this matter in the future?
> 
> TIA and regards,
> infoomatic
> 
> 
> [1] 
> OpenBSD 6.3 (GENERIC.MP) #107: Sat Mar 24 14:21:59 MDT 2018
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4238319616 (4041MB)
> avail mem = 4102795264 (3912MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xebb80 (74 entries)
> bios0: vendor American Megatrends Inc. version "0801" date 08/20/2014
> bios0: Thomas-Krenn.AG P9D-MV(X) Series
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT SSDT SSDT SSDT SSDT MCFG HPET SSDT SSDT 
> BERT DMAR EINJ ERST HEST
> acpi0: wakeup devices PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) 
> PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) GLAN(S4) EHC1(S4) EHC2(S4) 
> XHC_(S4) HDEF(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E3-1230L v3 @ 1.80GHz, 2594.38 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,SENSOR,ARAT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> acpitimer0: recalibrated TSC frequency 1795841682 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: