Re: OpenBSD 6.4 amd64: Kernel panic ffs_blkfree

2019-04-03 Thread Pietro Stäheli
Hi Claudio,

On 01/04/2019 15:06, Claudio Jeker wrote:
> In general the ffs_blkfree panic indicates that the filesystem is
> inconsitent. In most cases this is because the HW is dying.

Thanks for the hint, we found something like that.

It looks like the problem is that the iSCSI-connected storage isn't
entirely stable, so VMWare loses its storage for a short time. This only
rarely happens and I don't currently know how to reproduce the behavior.
But you're right, it's virtual dying hardware :)

Because this has only happened on the OpenBSD VM and none of the VMs
running different OS's on the same system were affected I felt like this
was an OpenBSD issue. Maybe Linux is less picky about inaccessible
storage, so nobody ever noticed anything.

> Lately these reports running with VMware seem to have increased.
> This may indicate that something with the disk emulation is not quite right.
> Since this is VMware specific a good way to reproduce the issue is needed.
> I have never seens such a panic on real mpi(4) attached disks so something
> on the VMware side is behaving not like a real mpi(4).
> 
> Not sure if it is possible to use a different disk emulation in VMware
> or use a different version and see if that behaves better.

There are a few other possible SCSI controllers available to try on
VMWare, but I'd need to find some time to do actual testing to see if
there's any difference in behavior. Maybe rather just invest that time
in fixing the underlying issue.

Best regards,
Pietro



openBGPd crashes in 6.2 and 6.3: "a politician in the decision process"

2018-09-10 Thread Pietro Stäheli
Hi Remi,

Sorry I didn't see your email sooner, Gmail decided to classify it as spam..

> I move this over to bugs@ which feels more appropriate to me. Chances that
> a bgpd developer sees it is higher.
> 
> Somehow there where two identical paths and bgpd could not figure out
> which one to prefer. The 13 steps bgpd goes through to find the best are
> listed in the bgpd man page.

That is the conclusion we arrived at as well.

> Do you have a pcap file with the bgp traffic or an mrt dump? (see dump in
> man bgpd.conf). This could answer the question why you had two identical
> path.
> 

We haven't got any dumps from when the problem happened. Would there
still be a point now that the RS is stable?

> Showing your config and your regular bgpctl commands could also help with
> analyzing the problem.

bgpctl command is as follows, it checks for the number of
non-established peers:

bgpctl show | grep ' 00:0' | grep -v "/" | wc -l


Slightly censored bgpd.conf:

https://pastebin.com/mqjpjHCx

On the other RS we added "announce restart no" to the configuration as
was advised earlier, just to see if that would be the difference in case
of another problem.

Best regards,
Pietro