Re: panic: biodone2 already
Robert Elz wrote: > This suggests to me that something is getting totally scrambled in > the buf headers when things get busy. I tried dumping the buf_t before panic, to check if it could be completely corrupted, but it seems it is not the case. Iblkno is 4904744, filesystem has 131891200 blocks. bp = 0xa5c1e000 bp->b_error = 0 bp->b_resid = 0 bp->b_flags = 0x10 bp->b_prio = 1 bp->b_bufsize = 2048 bp->b_bcount = 2048 bp->b_dev = 0x8e00 bp->b_blkno = 4904744 bp->b_proc = 0x0 bp->b_saveaddr = 0x0 bp->b_private = 0x0 bp->b_dcookie = 0 bp->b_refcnt = 1 bp->b_lblkno = 0 bp->b_cflags = 0x10 bp->b_vp = 0xa5c2b2a8 bp->b_oflags = 0x200 panic: biodone2 already db{0}> show vnode 0xa5c2b2a8 OBJECT 0xa5c2b2a8: locked=1, pgops=0x8058ac00, npages=0, refs=2 vnode 0xa5c2b2a8 flags 0x30 tag VT_UFS(1) type VDIR(2) mount 0xa448c000 typedata 0x0 usecount 2 writecount 0 holdcount 1 size 200 writesize 200 numoutput 0 data 0xa5c2cf00 lock 0xa5c2b3d8 state LOADED key(0xa448c000 8) 3a 9f 04 00 00 00 00 00 lrulisthd 0x8067bb80 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: panic: biodone2 already
For what it is worth, and in this case it might not be much, I did a similar test on my test amd64 DomU last night. Running dump and /etc/daily in parallel did nothing, but running lots of them in parallel eventually did cause a crash. But I saw a different crash -- rather than a biodone2, I got a KASSERT from one I added as part of attempting to diagnose the babylon5 "install failures" - that is, if my test kernel ever gets an I/O error, the KASSERT (which is just KASSERT(0)) causes a crash. This was intended for generic kernels that run in qemu - but I use the same sources for my generic testing, and simply left that there. My DomU test system *never* gets an I/O error, so it simply did not matter (its filesystem is on a raid on the Dom0, and neither the Dom0 nor the raid report anything even smelling like I/O errors, what's more, the Dom0 is more likely to crash than ever allow a real I/O error through to the DomU). This is the I/O error that occurred... [ 485570.8105971] xbd0a: error writing fsbn 49691936 of 49691936-49691951 (xbd0 bn 49691936; cn 24263 tn 0 sn 1312)panic: kernel diagnostic assertion "0" failed: file "/readonly/release/testing/src/sys/kern/subr_disk.c", line 163 What's kind of interesting about that, is that the DomU filesystem is ... format FFSv1 endian little-endian magic 11954 timeWed Aug 8 03:57:00 2018 superblock location 8192id [ 57248bd0 6db5a772 ] cylgrp dynamic inodes 4.4BSD sblock FFSv2 fslevel 4 nbfree 4037762 ndir2334nifree 2009341 nffree 3116 ncg 624 size33554432blocks 33289830 (no idea how I managed to make it FFSv1, but that doesn't matter). What is interesting is "blocks 33289830" when compared with the I/O error "error writing fsbn 49691936" which is 16402106 blocks beyond the end of the filesystem This suggests to me that something is getting totally scrambled in the buf headers when things get busy. kre
Re: panic: biodone2 already
Jaromír Dole?ek wrote: > Thanks. Could you please try a -current kernel for DOMU and see if it > crashes the same? If possible a DOMU kernel from daily builds, to rule > out local compiler issue. It crashes the same way with a kernel from 201808050730. here is uname -a output: NetBSD bacasable 8.99.23 NetBSD 8.99.23 (XEN3PAE_DOMU) #0: Sun Aug 5 06:48:50 UTC 2018 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/xen/compile/XEN3PAE_DOMU i386 > xbd is not mpsafe, so it shouldn't be even race due to parallell > processing on different CPUs. Maybe it would be useful to check if the > problem still happens when you assign just single CPU to the DOMU. I get the crash with vcpu = 1 for the domU. I also tried to pin a single cpu for the test domU, I still get it to crash: xl vcpu-pin bacasable 0 0 xl vcpu-pin $other_domU all 1-3 An interesting point: Adding the -X flag to dump seems to let it work without a panic. It may be luck, but that did not crash so far. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: panic: biodone2 already
2018-08-07 18:42 GMT+02:00 Emmanuel Dreyfus : > kern/53506 Thanks. Could you please try a -current kernel for DOMU and see if it crashes the same? If possible a DOMU kernel from daily builds, to rule out local compiler issue. There are not really many differences in xbd/evtchn code itself between 8.0 and -current however. There was some interrupt code reorganization which might affected this, but this happened after netbsd-8 was created. xbd is not mpsafe, so it shouldn't be even race due to parallell processing on different CPUs. Maybe it would be useful to check if the problem still happens when you assign just single CPU to the DOMU. Jaromir
Re: mutex_oncpu() called on destroyed mutex?
> Disabling preemption only affects the CPU that disabled it. But preemption is *en*abled in the code segment I quoted! > What's the stack trace of the panic? mutex_vector_enter() at netbsd:mutex_vector_enter+0x32c unp_thread() at netbsd:unp_thread+0x2eb
Re: mutex_oncpu() called on destroyed mutex? (was: repeated panics in mutex_vector_enter (from unp_thread))
> On Aug 7, 2018, at 9:44 AM, Edgar Fuß wrote: > > I observe this on 6.1, but I can't see the relevant code changed in current. > > mutex_vector_enter() does (-current uses KPREMPT_* macros) > > do { > kpreempt_enable(); > SPINLOCK_BACKOFF(count); > kpreempt_disable(); > owner = mtx->mtx_owner; > } while (mutex_oncpu(owner)); > > and my problem seems to be owner == MUTEX_THREAD (i.e. the mutex destroyed) > the time mutex_oncpu(owner) is called. > > My understanding of locking is limited (close to zero) but why shouldn't > the mutex in question be destroyed during the preemption-enabled period? > > I must be missing something. It could be destroyed by another thread on a different CPU. Disabling preemption only affects the CPU that disabled it. Sounds like this is just a classic use-after-free problem. What's the stack trace of the panic? Is the mutex embedded in some ephemeral data structure? -- thorpej
Re: ddb input via IPMI virtual console
Hello. Sorry, my description wasn't clear. Since you hav an IPMI capable server, you should be able to turn serial port redirection on in the BIOS such that com1 (from NetBSD's point of view) becomes a virtual port which is accessible using the ipmitool program. You would do something like: ipmitool -H 10.10.1.3 -U ADMIN -I lanplus sol activate After you enter the password, you should be connected to the virtual serial port where you can see output or type input. Since this is a serial port as far as NetBSD is concerned, DDB should work. This is a separate session from your virtual console, so you can run it in a separate window. Change the username and IP address shown above to match your setup. To get NetBSD to use that serial port as a console, you'd do something like: cd /usr/mdec installboot -v -o speed=115200 -o console=com1 /dev/boot bootxx_ffsv -Brian On Aug 7, 11:14am, Edgar =?iso-8859-1?B?RnXf?= wrote: } Subject: Re: ddb input via IPMI virtual console } > how about using a serial console in the kernel and then using ipmitool } > to talk to DDB when/if the machine goes down? } I don't have a serial wire through the firewall. } } > but kernel messages won't go there. } It would be an awful drawback not to see the kernel messages on a physical } console. >-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
mutex_oncpu() called on destroyed mutex? (was: repeated panics in mutex_vector_enter (from unp_thread))
I observe this on 6.1, but I can't see the relevant code changed in current. mutex_vector_enter() does (-current uses KPREMPT_* macros) do { kpreempt_enable(); SPINLOCK_BACKOFF(count); kpreempt_disable(); owner = mtx->mtx_owner; } while (mutex_oncpu(owner)); and my problem seems to be owner == MUTEX_THREAD (i.e. the mutex destroyed) the time mutex_oncpu(owner) is called. My understanding of locking is limited (close to zero) but why shouldn't the mutex in question be destroyed during the preemption-enabled period? I must be missing something.
Re: panic: biodone2 already
Martin Husemann wrote: > Please file a PR. kern/53506 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: panic: biodone2 already
Emmanuel Dreyfus wrote: > /sbin/dump -a0f /dev/null / > sh /etc/daily The second command can be replaced by a simple grep -r something /etc But so far I did not managed to crash without running dump. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: panic: biodone2 already
On Tue, Aug 07, 2018 at 05:30:27PM +0200, Emmanuel Dreyfus wrote: > I can reproduce the crash at will: running at the same time the two > following commands reliabily trigger "panic biodone2 already" > > /sbin/dump -a0f /dev/null / > sh /etc/daily Please file a PR. Martin
Re: panic: biodone2 already
Jaromír Dole?ek wrote: > This is always a bug, driver processes same buf twice. It can do harm. > If the buf is reused for some other I/O, system can fail to store > data, or claim to read data when it didn't. I can reproduce the crash at will: running at the same time the two following commands reliabily trigger "panic biodone2 already" /sbin/dump -a0f /dev/null / sh /etc/daily That crash on NetBSD-8.0 domU, either i386 or amd64. At mine the only machines that managed to spare the bug so far were the one where the dump is finished when /etc/daily starts. All the other one keep rebooting at least once a day. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: ddb input via IPMI virtual console
>> how about using a serial console in the kernel and then using >> ipmitool to talk to DDB when/if the machine goes down? > I don't have a serial wire through the firewall. You have a multiconductor cable; while it's intended for PS/2, if it has at least three conductors (which I believe PS/2 does), it can be used perfectly well as a serial line. You'll need adaptors on the ends of the cable, but they need be only passive adaptors. RS-232 is ridiculously tolerant of layer-1 issues. Unless the insulation is rated for only 6V or something, which strikes me as unlikely enough that I wouldn't even bother checking if it were me in that situation. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: repeated panics in mutex_vector_enter (from unp_thread)
Could someone please aid me in how to debug this? The server repeatedly panics the same way after ionly hours or even minutes of uptime!
Re: NetBSD 8.0 dead socket
Emmanuel Dreyfus a écrit : > Hello Hello, > Another bug I observed with NetBSD 8.0. So far it hapened only to > named, but perhaps it is more genéric. Very strange. I use a NetBSD server as main router/NFS server/name server with bind9 and NetBSD 8.0 (build from sources) and I haven't seen this kind of trouble. > named is bound to multiple addresses. After some time, it cease to > answer on one of them, while the others keep working. tcdump show > the packets going to port 53 without reply. ktrace shows no data > coming from kernel. And netstat shows named is still bound to the > offending addresse. > > Someone else experienced it? Any idea on where to look at? I have seen a strange bug when a socket is opened in IPv4 _and_ IPv6 in 8-RC1 or 8-BETA, I don't remember. Best regards, JKB
Re: ddb input via IPMI virtual console
On Tue, Aug 07, 2018 at 11:19:28AM +0200, Edgar Fuß wrote: > > Put it in the machine room, use the existing PS/2 keyboards, and... isn't > > the problem solved? > ... as I've been told, DDB isn't able to talk to USB keyboards (or did I > get that wrong?). So I would end with neither IPMI nor real console working. ddb should be able to talk to the console keyboard (via polling), but not additional keyboards. Martin
Re: 8.0 performance issue when running build.sh?
> On 6. Aug 2018, at 23:18, Mindaugas Rasiukevicius wrote: > > Martin Husemann wrote: >> So here is a more detailed analyzis using flamegraphs: >> >> https://netbsd.org/~martin/bc-perf/ >> >> >> All operations happen on tmpfs, and the locking there is a significant >> part of the game - however, lots of mostly idle time is wasted and it is >> not clear to me what is going on. > > Just from a very quick look, it seems like a regression introduced with > the vcache changes: the MP-safe flag is set too late and not inherited > by the root vnode. > > http://www.netbsd.org/~rmind/tmpfs_mount_fix.diff Very good catch, @martin could you try this diff on an autobuilder? Looks like it speeds up tmpfs lookups by a factor of ~40 on -8.0. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: ddb input via IPMI virtual console
> Since the problem is that the real keyboards are PS/2, the adapters sound > perfect. Ah, yes, that sounds like a perfect solution, but ... > Put it in the machine room, use the existing PS/2 keyboards, and... isn't > the problem solved? ... as I've been told, DDB isn't able to talk to USB keyboards (or did I get that wrong?). So I would end with neither IPMI nor real console working.
Re: ddb input via IPMI virtual console
On Tue, Aug 07, 2018 at 11:14:12AM +0200, Edgar Fuß wrote: > > how about using a serial console in the kernel and then using ipmitool > > to talk to DDB when/if the machine goes down? > I don't have a serial wire through the firewall. You configure the kernel for serial console output and use IPMI to talk to it. Martin
Re: ddb input via IPMI virtual console
> how about using a serial console in the kernel and then using ipmitool > to talk to DDB when/if the machine goes down? I don't have a serial wire through the firewall. > but kernel messages won't go there. It would be an awful drawback not to see the kernel messages on a physical console.
RE: ddb input via IPMI virtual console
>> The real keyboards are PS/2 and I can't change that because it runs >> on a wire physically passing a /real/ firewall, [...] > >(a) Is it possible to run USB over the same conductors used by the PS/2 >cable? (This is a real question; I don't know enough about layer 1 of >either to answer it.) Great question. The answer is "yes, if you're desperate". Keyboards are low-speed or full-speed USB devices, and the signaling is relatively non-critical -- especially if you find a low-speed keyboard. However: >(b) There exist devices that adapt PS/2 to USB in the >PS/2-keyboard-to-USB-host direction. Since the problem is that the real keyboards are PS/2, the adapters sound perfect. https://www.newegg.com/Mouse-Keyboard-PS2-Adapters/SubCategory/ID-3024 US$ 6 pretty reasonable. Put it in the machine room, use the existing PS/2 keyboards, and... isn't the problem solved? If you need USB as well from another source, use a hub (in the machine room). But I confess I'm not clear about that part of what you're trying to do. --Terry