Re: panic: biodone2 already

2018-08-18 Thread Michael van Elst
On Fri, Aug 17, 2018 at 12:59:57PM +0200, Jaromir Dolecek wrote: > > yes, one of the problems is the code happily handles stale bufs. it does not > clear the buf pointer when the response is already handled. we should add > some KASSERTs there for this and clear the response structure on reuse.

Re: panic: biodone2 already

2018-08-17 Thread Jaromir Dolecek
> Le 17 août 2018 à 07:07, Michael van Elst a écrit : > >> On Fri, Aug 17, 2018 at 02:23:16AM +, Emmanuel Dreyfus wrote: >> >>blkif_response_t *rep = RING_GET_RESPONSE(&sc->sc_ring, i); >>struct xbd_req *xbdreq = &sc->sc_reqs[rep->id]; >>bp

Re: panic: biodone2 already

2018-08-16 Thread Michael van Elst
On Fri, Aug 17, 2018 at 02:23:16AM +, Emmanuel Dreyfus wrote: > blkif_response_t *rep = RING_GET_RESPONSE(&sc->sc_ring, i); > struct xbd_req *xbdreq = &sc->sc_reqs[rep->id]; > bp = xbdreq->req_bp; > > It decides to call dk_done for the last occu

Re: panic: biodone2 already

2018-08-16 Thread Emmanuel Dreyfus
On Fri, Aug 10, 2018 at 06:55:46AM -, Michael van Elst wrote: > a queued operation eventually returns with a call to xbd_handler. > - for every buffer returned, dk_done is called which finally ends > in invoking biodone. After adding debug statements, I can now tel the offending buf_t is qu

Re: panic: biodone2 already

2018-08-10 Thread Michael van Elst
m...@netbsd.org (Emmanuel Dreyfus) writes: >I can tell that in vfs_bio.c, bread() -> bio_doread() will call >VOP_STRATEGY once for the offendinf buf_t, but biodone() is called twice >in interrupt context for the buf_t, leading to the biodone2 already >panic later. >Since you know the xbd code you

Re: panic: biodone2 already

2018-08-09 Thread Emmanuel Dreyfus
Emmanuel Dreyfus wrote: > > xbd is not mpsafe, so it shouldn't be even race due to parallell > > processing on different CPUs. Maybe it would be useful to check if the > > problem still happens when you assign just single CPU to the DOMU. > > I get the crash with vcpu = 1 for the domU. I

Re: panic: biodone2 already

2018-08-08 Thread Emmanuel Dreyfus
On Wed, Aug 08, 2018 at 10:30:23AM +0700, Robert Elz wrote: > This suggests to me that something is getting totally scrambled in > the buf headers when things get busy. I tried to crash with BIOHIST enabled, here is he story about the buf_t that triggers the panic beause it as BO_DONE: 1533732322

Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Robert Elz wrote: > This suggests to me that something is getting totally scrambled in > the buf headers when things get busy. I tried dumping the buf_t before panic, to check if it could be completely corrupted, but it seems it is not the case. Iblkno is 4904744, filesystem has 131891200 block

Re: panic: biodone2 already

2018-08-07 Thread Robert Elz
For what it is worth, and in this case it might not be much, I did a similar test on my test amd64 DomU last night. Running dump and /etc/daily in parallel did nothing, but running lots of them in parallel eventually did cause a crash. But I saw a different crash -- rather than a biodone2, I got

Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Jaromír Dole?ek wrote: > Thanks. Could you please try a -current kernel for DOMU and see if it > crashes the same? If possible a DOMU kernel from daily builds, to rule > out local compiler issue. It crashes the same way with a kernel from 201808050730. here is uname -a output: NetBSD bacasable

Re: panic: biodone2 already

2018-08-07 Thread Jaromír Doleček
2018-08-07 18:42 GMT+02:00 Emmanuel Dreyfus : > kern/53506 Thanks. Could you please try a -current kernel for DOMU and see if it crashes the same? If possible a DOMU kernel from daily builds, to rule out local compiler issue. There are not really many differences in xbd/evtchn code itself betwe

Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Martin Husemann wrote: > Please file a PR. kern/53506 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org

Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Emmanuel Dreyfus wrote: > /sbin/dump -a0f /dev/null / > sh /etc/daily The second command can be replaced by a simple grep -r something /etc But so far I did not managed to crash without running dump. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org

Re: panic: biodone2 already

2018-08-07 Thread Martin Husemann
On Tue, Aug 07, 2018 at 05:30:27PM +0200, Emmanuel Dreyfus wrote: > I can reproduce the crash at will: running at the same time the two > following commands reliabily trigger "panic biodone2 already" > > /sbin/dump -a0f /dev/null / > sh /etc/daily Please file a PR. Martin

Re: panic: biodone2 already

2018-08-07 Thread Emmanuel Dreyfus
Jaromír Dole?ek wrote: > This is always a bug, driver processes same buf twice. It can do harm. > If the buf is reused for some other I/O, system can fail to store > data, or claim to read data when it didn't. I can reproduce the crash at will: running at the same time the two following commands

Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Emmanuel Dreyfus wrote: > Here it is. And here is another flavor below I am now convinced the problem came with NetBSD 8.0: I found that two other domU crashed on daily backup since NetBSD 8.0 upgrade, and the panic is also biodone2 already. I start downgrading today. uvm_fault(0xc06d4960, 0

Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Martin Husemann wrote: > What driver is this? xbd, this is an NetBSD-8.0/i386 Xen domU on top of a NetBSD-8.0/amd64 dom0 running on Xen 4.8.3. In the dom0, the disk image is in a file in a FFSv2 filesystem on a RAIDframe RAID 1, with two wd disks. -- Emmanuel Dreyfus http://hcpnet.free.fr/pub

Re: panic: biodone2 already

2018-08-06 Thread Martin Husemann
On Mon, Aug 06, 2018 at 08:37:56PM +0200, Emmanuel Dreyfus wrote: > cpu0: Begin traceback... > vpanic(c04da74f,dcbdfdd4,dcbdfe58,c010bb65,c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202) > at netbsd:vpanic+0x12d > panic(c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202,fd893fff,8,c06ba580,ff491) > at net

Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Jaromír Dole?ek wrote: > Can you give full backtrace? Here it is. I wonder if it could change things without -o log cpu0: Begin traceback... vpanic(c04da74f,dcbdfdd4,dcbdfe58,c010bb65,c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202) at netbsd:vpanic+0x12d panic(c04da74f,dcbdfe64,dcbdfe64,4,dcbde2

Re: panic: biodone2 already

2018-08-06 Thread Jaromír Doleček
This is always a bug, driver processes same buf twice. It can do harm. If the buf is reused for some other I/O, system can fail to store data, or claim to read data when it didn't. Can you give full backtrace? Jaromir 2018-08-06 17:56 GMT+02:00 Emmanuel Dreyfus : > Hello > > I have a Xen domU th