Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-07-25 Thread Mathieu BLANC
On Tue, May 02, 2017 at 05:03:20PM +, Stuart Henderson wrote:
> Probably the best thing to do at this point is to write a mail to bugs@:
> 
> 1. describe what the machine is doing in detail. carp? ipsec? pfsync?
> what sort of relays? include config (sanitized if necessary, but do that
> consistently).
> 
> 2. copy in the panic message and stack trace as text (re-type it,
> don't attach a picture or send a link to a picture).
> 
> 3. make it a self-contained report with description etc all in the one
> message, don't rely on people having message history.
> 
> 4. include dmesg.

Hi Stuart, 

Thx for your answer !
I didn't have the time to work on this since early may.
But from time to time, I check the commit on pf.c and I saw this one which
seemed to perfectly match my bug :
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf.c?rev=1.1035&content-type=text/x-cvsweb-markup

I tried the diff, and it seems to be OK ! I can't trigger the bug right now (it
was 100% before).

So, thx you again, and special thx to bluhm@ who made the patch ! 

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-05-02 Thread Stuart Henderson
On 2017-05-02, Mathieu BLANC  wrote:
> On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote:
>> It also kernel panics with just this pf rules :
>> # cat pf_minimal.conf 
>> set limit { states 10 }  
>> set skip on lo   
>> anchor "relayd/*"
>> pass 
>> 
>
> I upgraded the system to 6.1 release last week, the kernel panic is still here
> (with the same logs).

Probably the best thing to do at this point is to write a mail to bugs@:

1. describe what the machine is doing in detail. carp? ipsec? pfsync?
what sort of relays? include config (sanitized if necessary, but do that
consistently).

2. copy in the panic message and stack trace as text (re-type it,
don't attach a picture or send a link to a picture).

3. make it a self-contained report with description etc all in the one
message, don't rely on people having message history.

4. include dmesg.




Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-05-02 Thread Mathieu BLANC
On Tue, May 02, 2017 at 03:44:43PM +0200, Andre Ruppert wrote:
> Hi,
> 
> Im running 6.0 amd64 on a pair of R210 with relayd, but these are R210 (II).
> 
> No kernel panics at all, and these systems are working in a live
> environment...
> 
> Regards
> Andre

Hi,

Yes, i have also several OpenBSD on R210 + 6.0 (or 6.1) + relayd and it works
like a charm. 

The only problem appeared when an admin did a REJECT (iptables) on one on the
host checked by relayd with check tcp (i tried to put all the details i could
in the previous mails).

The next step is to try with current (until now i've waited for the 6.1 release
which was very close to be released).

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-05-02 Thread Andre Ruppert

Hi,

Im running 6.0 amd64 on a pair of R210 with relayd, but these are R210 (II).

No kernel panics at all, and these systems are working in a live 
environment...


Regards
Andre



Am 02.05.17 um 15:03 schrieb Mathieu BLANC:

On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote:

It also kernel panics with just this pf rules :
# cat pf_minimal.conf
set limit { states 10 }
set skip on lo
anchor "relayd/*"
pass



I upgraded the system to 6.1 release last week, the kernel panic is still here
(with the same logs).





smime.p7s
Description: S/MIME Cryptographic Signature


Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-05-02 Thread Mathieu BLANC
On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote:
> It also kernel panics with just this pf rules :
> # cat pf_minimal.conf 
> set limit { states 10 }  
> set skip on lo   
> anchor "relayd/*"
> pass 
> 

I upgraded the system to 6.1 release last week, the kernel panic is still here
(with the same logs).

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-29 Thread Mathieu BLANC
On Wed, Mar 29, 2017 at 10:40:08AM +0200, Mathieu BLANC wrote:
> On Tue, Mar 28, 2017 at 05:58:02PM +0200, Hiltjo Posthuma wrote:
> > On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote:
> > > On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote:
> > > > I can reproduce the bug (on the slave firewall) as many times as I want.
> > > > 
> > > 
> > > I've just read https://www.openbsd.org/ddb.html and saw that you need a 
> > > trace
> > > for all cpu.
> > > 
> > > http://www.hostingpics.net/viewer.php?id=238876panic9.jpg
> > > http://www.hostingpics.net/viewer.php?id=275943panic10.jpg
> > > http://www.hostingpics.net/viewer.php?id=375143panic11.jpg
> > > http://www.hostingpics.net/viewer.php?id=220012panic12.jpg
> > > 
> > > (it's a different crash from the last screenshots i've made, if it's not 
> > > good i
> > > can provide a full new set of pics)
> > > 
> > > -- 
> > > Mathieu
> > > 
> > 
> > Hey,
> > 
> > Can you also provide your pf.conf ?
> > 
> > Can you test if it also happens on -current?
> > 
> > -- 
> > Kind regards,
> > Hiltjo
> 
> Hello,
> 
> Unfortunately, i can't provide pf.conf as is (too many references to 
> customers,
> ips, etc...). But i think i can work on a minimal file which triggers the bug.
> I'll see that.
> 

It also kernel panics with just this pf rules :
# cat pf_minimal.conf 
set limit { states 10 }  
set skip on lo   
anchor "relayd/*"
pass 

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-29 Thread Mathieu BLANC
On Tue, Mar 28, 2017 at 05:58:02PM +0200, Hiltjo Posthuma wrote:
> On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote:
> > On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote:
> > > I can reproduce the bug (on the slave firewall) as many times as I want.
> > > 
> > 
> > I've just read https://www.openbsd.org/ddb.html and saw that you need a 
> > trace
> > for all cpu.
> > 
> > http://www.hostingpics.net/viewer.php?id=238876panic9.jpg
> > http://www.hostingpics.net/viewer.php?id=275943panic10.jpg
> > http://www.hostingpics.net/viewer.php?id=375143panic11.jpg
> > http://www.hostingpics.net/viewer.php?id=220012panic12.jpg
> > 
> > (it's a different crash from the last screenshots i've made, if it's not 
> > good i
> > can provide a full new set of pics)
> > 
> > -- 
> > Mathieu
> > 
> 
> Hey,
> 
> Can you also provide your pf.conf ?
> 
> Can you test if it also happens on -current?
> 
> -- 
> Kind regards,
> Hiltjo

Hello,

Unfortunately, i can't provide pf.conf as is (too many references to customers,
ips, etc...). But i think i can work on a minimal file which triggers the bug.
I'll see that.

Fur -current, my idea was to try if i didn't get any response on the list for
-stable. 

But for now, we don't have any -current in production so i'm not sure :) 

I know there are plenty of people who have -current, i'm pretty confident with
it, but it's more a question of procedure, for example how to follow -current
efficiently over time. With -release and -stable it's pretty simple, upgrade
every 6 months + a few patch and it's ok :)

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Hiltjo Posthuma
On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote:
> On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote:
> > I can reproduce the bug (on the slave firewall) as many times as I want.
> > 
> 
> I've just read https://www.openbsd.org/ddb.html and saw that you need a trace
> for all cpu.
> 
> http://www.hostingpics.net/viewer.php?id=238876panic9.jpg
> http://www.hostingpics.net/viewer.php?id=275943panic10.jpg
> http://www.hostingpics.net/viewer.php?id=375143panic11.jpg
> http://www.hostingpics.net/viewer.php?id=220012panic12.jpg
> 
> (it's a different crash from the last screenshots i've made, if it's not good 
> i
> can provide a full new set of pics)
> 
> -- 
> Mathieu
> 

Hey,

Can you also provide your pf.conf ?

Can you test if it also happens on -current?

-- 
Kind regards,
Hiltjo



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Mathieu BLANC
On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote:
> I can reproduce the bug (on the slave firewall) as many times as I want.
> 

I've just read https://www.openbsd.org/ddb.html and saw that you need a trace
for all cpu.

http://www.hostingpics.net/viewer.php?id=238876panic9.jpg
http://www.hostingpics.net/viewer.php?id=275943panic10.jpg
http://www.hostingpics.net/viewer.php?id=375143panic11.jpg
http://www.hostingpics.net/viewer.php?id=220012panic12.jpg

(it's a different crash from the last screenshots i've made, if it's not good i
can provide a full new set of pics)

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Mathieu BLANC
On Tue, Mar 28, 2017 at 12:05:56PM +0300, Mihai Popescu wrote:
> Isn't there a CAPSLOOK written message at panic time on the screen?
> If not, look here:
> http://www.openbsd.org/report.html
> 

I can reproduce the bug (on the slave firewall) as many times as I want.

I made some screenshots. Sorry, I didn't manage to provide text logs (i'm in
DRAC). 

In http://man.openbsd.org/OpenBSD-6.0/crash i saw that i might be able to have
the ddb logs in dmesg after a warm reboot but it didn't work for me.

I don't know if you prefer http links or attached files. I have uploaded the
jpg here : 
http://www.hostingpics.net/viewer.php?id=835545panic1.jpg
http://www.hostingpics.net/viewer.php?id=149061panic2.jpg
http://www.hostingpics.net/viewer.php?id=328015panic3.jpg
http://www.hostingpics.net/viewer.php?id=730910panic4.jpg
http://www.hostingpics.net/viewer.php?id=607164panic5.jpg
http://www.hostingpics.net/viewer.php?id=272177panic6.jpg
http://www.hostingpics.net/viewer.php?id=689399panic7.jpg
http://www.hostingpics.net/viewer.php?id=499214panic8.jpg

I can attach the files if you want.

Here is my relayd conf :

_front_vip="A.B.C.D"

_front1="E.F.G.H"
_front2="I.J.K.L"

table  { $_front1 $_front2 }

redirect _http_vip {
listen on $_front_vip port http
forward to  mode source-hash check tcp
pftag RELAYD_VIP_NAT
}

On front1, if i made this command, my openbsd system crash. With DROP instead
of REJECT it's OK (tested 5-6 times) :
iptables -I INPUT -j REJECT -p tcp --dport 80 -s 

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Mihai Popescu
Isn't there a CAPSLOOK written message at panic time on the screen?
If not, look here:
http://www.openbsd.org/report.html



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Mathieu BLANC
On Mon, Mar 27, 2017 at 02:42:23PM +0200, Mathieu BLANC wrote:
> Hello all,
> 
> I have a pair of firewalls running 6.0 (patched with openup in october, no 
> patch
> applied since then). 
> 
> Since the upgrade, this pair has some problem with kernel
> panics (4 times since the upgrade in october).
> 
> The last one was this morning. The two firewall crashed at the same time with
> these logs :
> 
> /bsd: panic: kernel diagnostic assertion "(sk->inp == NULL) || 
> (sk->inp->inp_pf_sk == NULL)" failed: file "../../../../net/pf.c", line 6891
> /bsd: Starting stack trace...
> /bsd: panic() at panic+0x10b
> /bsd: __assert() at __assert+0x25
> /bsd: pf_state_key_unref() at pf_state_key_unref+0xc6
> /bsd: pf_pkt_unlink_state_key() at pf_pkt_unlink_state_key+0x15
> /bsd: m_free() at m_free+0xa0
> /bsd: sbdroprecord() at sbdroprecord+0x61
> /bsd: soreceive() at soreceive+0xb4f
> /bsd: recvit() at recvit+0x139
> /bsd: sys_recvfrom() at sys_recvfrom+0x9d
> /bsd: syscall() at syscall+0x27b
> /bsd: --- syscall (number 29) ---
> /bsd: end of kernel
> /bsd: end trace frame: 0x7f7dc870, count: 247
> /bsd: 0x18ccb3b21ada:
> /bsd: End of stack trace. 
> 

Hello,

This morning, another crash.

I found in daemon.log something very interesting. At the same second the
firewall crashed, i had the same resource checked by relayd which was gone down 
:

Yesterday :
Mar 27 11:51:48 fw5 relayd[94179]: host W.X.Y.Z, check tcp (16010ms,tcp connect 
timeout), state up -> down, availability 99.94%
Mar 27 11:51:48 fw5 relayd[89662]: table _http_vip: 0 added, 1 deleted, 0 
changed, 0 killed

This morning :
Mar 28 09:08:54 fw5 relayd[46733]: host W.X.Y.Z, check tcp (16010ms,tcp connect 
timeout), state up -> down, availability 99.95%
Mar 28 09:08:54 fw5 relayd[29633]: table _http_vip: 0 added, 1 deleted, 0 
changed, 0 killed

I called the admin in charge of host W.X.Y.Z. What he did on W.X.Y.Z was an
iptables REJECT command on the host (to remove it from relayd). We have tested
with DROP and it seems to not trigger the bug (i'll try to make more tests).