Re: cgdstrategy: divide fault in supervisor mode
On Wed, Sep 14, 2016 at 07:18:56AM +0100, Alexander Nasonov wrote: > Michael van Elst wrote: > > Ah, maybe then: > > > > --- cgd.c 5 Aug 2016 08:24:46 - 1.110 > > +++ cgd.c 13 Sep 2016 21:43:27 - > > @@ -305,13 +305,17 @@ > > static void > > cgdstrategy(struct buf *bp) > > { > > - struct cgd_softc *cs = getcgd_softc(bp->b_dev); > > - struct dk_softc *dksc = &cs->sc_dksc; > > - struct disk_geom *dg = &dksc->sc_dkdev.dk_geom; > > + struct cgd_softc *cs; > > + struct dk_softc *dksc; > > + struct disk_geom *dg; > > > > DPRINTF_FOLLOW(("cgdstrategy(%p): b_bcount = %ld\n", bp, > > (long)bp->b_bcount)); > > > > + GETCGD_SOFTC(cs, bp->b_dev); > > + dksc = &cs->sc_dksc; > > + dg = &dksc->sc_dkdev.dk_geom; > > + > > It will not compile because cgdstrategy() returns void. Right. This needs to be written differently. Instead of GETCGD_SOFTC() use: cs = getcgd_softc(bp->b_dev); if (!cs) { bp->b_error = ENXIO; biodone(bp); return; } -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: cgdstrategy: divide fault in supervisor mode
Michael van Elst wrote: > Ah, maybe then: > > --- cgd.c 5 Aug 2016 08:24:46 - 1.110 > +++ cgd.c 13 Sep 2016 21:43:27 - > @@ -305,13 +305,17 @@ > static void > cgdstrategy(struct buf *bp) > { > - struct cgd_softc *cs = getcgd_softc(bp->b_dev); > - struct dk_softc *dksc = &cs->sc_dksc; > - struct disk_geom *dg = &dksc->sc_dkdev.dk_geom; > + struct cgd_softc *cs; > + struct dk_softc *dksc; > + struct disk_geom *dg; > > DPRINTF_FOLLOW(("cgdstrategy(%p): b_bcount = %ld\n", bp, > (long)bp->b_bcount)); > > + GETCGD_SOFTC(cs, bp->b_dev); > + dksc = &cs->sc_dksc; > + dg = &dksc->sc_dkdev.dk_geom; > + It will not compile because cgdstrategy() returns void. Alex
Re: Move kern_ctf to the dtrace module?
It was suggested to me in private Email that perhaps the file sys/kern/kern_ctf.c should be moved out of src/sys and into "the module's [source] directory." I'd rather not move the file into src/external//dtrace/dev/fbt/ because kern_ctf.c and the existing file both include some headers with the same name, but expecting different files to be included! (dtrace provides its own versions of sys/kobj.h and sys/kobj_impl.h headers.) We generally don't put actual source files into src/sys/modules/* but it might be appropriate in this case. Either way, I'm not really happy about using file-specific CPP_FLAGS in sys/modules/dtrace/fbt/Makefile but the only other way to make this work would be to create a dtrace_fbt_ctf module, on which dtrace_fbt would depend. This introduces an additional layer of module-autoload, and would occupy a slightly larger in-memory footprint, but should not have any other impact. So, questions to be answered: * Where should sys/kern/kern_ctf.c reside? * Should kern_ctf.c be a separate (sub)module? Any thoughts? On Tue, 13 Sep 2016, Paul Goyette wrote: On Mon, 12 Sep 2016, Paul Goyette wrote: Currently, if you include KDTRACE_HOOKS in your kernel, we automatically include src/sys/kern/kern_ctf.[co] in the resulting kernel. kern_ctf defines only one global symbol, mod_ctf_get(), according to the generated netbsd.map file, and there are no references anywhere else in the kernel to this routine; the only reference is within the sub-module dtrace_fbt.kmod I would like to suggest that kern_ctf be removed from the kernel, and added to the dtrace_fbt module. And, since kern_ctf references stuff in net/zlib.c, the dtrace_fbt module should depend on zlib module. And finally, the ktrace_hooks dependency can be removed from the zlib.c line in net/files.net The bottom line is that zlib code won't be included in kernels that don't need it, yet will still be available when using dtrace (which is only available as a module). If I don't see any significant objections, I'll post diffs in a day or so before making any commits. Diffs are attached here. I plan to commit within the next couple of days... +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | +--+--++ +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | +--+--++
Re: cgdstrategy: divide fault in supervisor mode
On Tue, Sep 13, 2016 at 09:27:11PM +0100, Alexander Nasonov wrote: > > I can reproduce division by zero but not when rebooting. If I take > an unconfigured cgd device, i.e. cgd2 and run > > mount /dev/cgd2d /tmp > > the kernel will panic instead of returning ENXIO. Ah, maybe then: --- cgd.c 5 Aug 2016 08:24:46 - 1.110 +++ cgd.c 13 Sep 2016 21:43:27 - @@ -305,13 +305,17 @@ static void cgdstrategy(struct buf *bp) { - struct cgd_softc *cs = getcgd_softc(bp->b_dev); - struct dk_softc *dksc = &cs->sc_dksc; - struct disk_geom *dg = &dksc->sc_dkdev.dk_geom; + struct cgd_softc *cs; + struct dk_softc *dksc; + struct disk_geom *dg; DPRINTF_FOLLOW(("cgdstrategy(%p): b_bcount = %ld\n", bp, (long)bp->b_bcount)); + GETCGD_SOFTC(cs, bp->b_dev); + dksc = &cs->sc_dksc; + dg = &dksc->sc_dkdev.dk_geom; + /* * Reject unaligned writes. We can encrypt and decrypt only * complete disk sectors, and we let the ciphers require their Greetings, -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: cgdstrategy: divide fault in supervisor mode
Alexander Nasonov wrote: > I can examine data at 0x34 offset and it's indeed 0. Correction: $rdi+0x40 is the correct location. I also inspected low half of dev_t ($rbx+0x38) and its value was 0x1423 which corresponds to: brw-r- 1 root operator 20, 35 Dec 13 2015 /dev/cgd2d Alex
Re: cgdstrategy: divide fault in supervisor mode
Michael van Elst wrote: > That would require dg_secsize to be 0 which is difficult to do > because the drivers initialize the value and the common disk_set_info() > function fixes up a zero value. I can reproduce division by zero but not when rebooting. If I take an unconfigured cgd device, i.e. cgd2 and run mount /dev/cgd2d /tmp the kernel will panic instead of returning ENXIO. > But maybe the dg pointer is bad? Please have a look at the %rdi > register. I don't know what was rdi's value when it crashed during reboot but crashes when mounting /dev/cgd2d all have good kernel-space values. I can examine data at 0x34 offset and it's indeed 0. $ crash -M netbsd.12.core Crash version 7.99.36, image version 7.99.36. System panicked: dump forced via kernel debugger Backtrace from time of crash is available. crash> dmesg|tail iwm0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36M bps 48Mbps 54Mbps acpibat0: normal capacity on 'charge state' fatal integer divide fault in supervisor mode trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 4d8000 ilevel 0 rs p fe8116cfba50 curlwp 0xfe836fcbab00 pid 13.1 lowest kstack 0xfe8116cf82c0 dumping to dev 20,17 (offset=212951, size=3119109): dump crash> bt _KERNEL_OPT_NARCNET() at 0 _KERNEL_OPT_NARCNET() at 0 db_reboot_cmd() at db_reboot_cmd db_command() at db_command+0xeb db_command_loop() at db_command_loop+0x90 db_trap() at db_trap+0xe3 kdb_trap() at kdb_trap+0xe1 trap() at trap+0x574 --- trap (number 8) --- cgdstrategy() at cgdstrategy+0x26 bdev_strategy() at bdev_strategy+0x68 spec_strategy() at spec_strategy+0x81 VOP_STRATEGY() at VOP_STRATEGY+0x33 bio_doread() at bio_doread+0x98 bread() at bread+0x1a ffs_mountfs() at ffs_mountfs+0x170 ffs_mount() at ffs_mount+0x227 VFS_MOUNT() at VFS_MOUNT+0x34 mount_domount() at mount_domount+0x122 do_sys_mount() at do_sys_mount+0x20f sys___mount50() at sys___mount50+0x33 syscall() at syscall+0x15b --- syscall (number 410) --- 75c7da: crash> ps PIDLID S CPU FLAGS STRUCT LWP * NAME WAIT 13 > 1 7 1 0 fe836fcbab00 mount_ffs 12 1 2 1 802 fe811681d2a0 mount 81 2 1 802 fe811681d6c0ksh 21 2 1 802 fe811681dae0ksh 11 2 1 802 fe81163f5680 init ... Alex
Re: cgdstrategy: divide fault in supervisor mode
al...@yandex.ru (Alexander Nasonov) writes: >All mount/umount worked but when I ran reboot, the system trapped here: >fatal integer divide fault in supervisor mode >trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 efd... >curlwp 0xfe81163b4a40 pid 276.1 lowest kstack 0xfe8117343... >kernel: integer divide fault trap, code=0 >Stopped in pid 276.1 (reboot) atnetbsd:cgdstrategy+0x26: >if (bp->b_blkno < 0 || >(bp->b_bcount % dg->dg_secsize) != 0 || >808db36b: 89 c8 mov%ecx,%eax >808db36d: 31 d2 xor%edx,%edx >808db36f: f7 77 40divl 0x40(%rdi) That would require dg_secsize to be 0 which is difficult to do because the drivers initialize the value and the common disk_set_info() function fixes up a zero value. But maybe the dg pointer is bad? Please have a look at the %rdi register. N.B. there are some rare failure paths in getcgd_softc() that would return a NULL pointer that isn't checked. If the kernel maps zeros at NULL this could trigger a divide error here. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
cgdstrategy: divide fault in supervisor mode
Someone warned me that adding cgd to dump devices can have bad consequences. I think I caught one possible bad case yesterday. I was lucky enough to still have my data. My setup is quite complicated. I have a small root on wd0a which does only one thing: to mount a real root on cgd0a and chroot to it. The rest of the system is on cgd1. I was in a single-user mode, inside /altroot (iirc), all fs mounted but I wanted to remount them in read-only mode. Some of them couldn't be unmounted and I forced umounts with the -f flag. Then I mounted them with read-only flag. I don't remember exact commands but I have nested mount points, e.g. /var/log inside /var and I was definitely trying to remount both inner and outer fs. All mount/umount worked but when I ran reboot, the system trapped here: fatal integer divide fault in supervisor mode trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 efd... curlwp 0xfe81163b4a40 pid 276.1 lowest kstack 0xfe8117343... kernel: integer divide fault trap, code=0 Stopped in pid 276.1 (reboot) atnetbsd:cgdstrategy+0x26: 4 0(%rdi),%eax This it what I run: NetBSD neva 7.99.36 NetBSD 7.99.36 (GENERIC) #0: Fri Sep 2 22:04:02 BST 2016 alnsn@nebeda:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC amd64 Sources checked out on Sep 2. Looking at the assembly, it appears that the fault happened at the second line of this branch: if (bp->b_blkno < 0 || (bp->b_bcount % dg->dg_secsize) != 0 || (offset of b_blkno is 0x48, b_bcount's offset is 0x34). 808db349 : 808db349: 55 push %rbp 808db34a: 48 89 e5mov%rsp,%rbp 808db34d: 53 push %rbx 808db34e: 48 83 ec 08 sub$0x8,%rsp 808db352: 48 89 fbmov%rdi,%rbx 808db355: 48 8b 7f 38 mov0x38(%rdi),%rdi 808db359: e8 4d fe ff ff callq 808db1ab 808db35e: 48 83 7b 48 00 cmpq $0x0,0x48(%rbx) 808db363: 78 3d js 808db3a2 808db365: 48 89 c7mov%rax,%rdi 808db368: 8b 4b 34mov0x34(%rbx),%ecx 808db36b: 89 c8 mov%ecx,%eax 808db36d: 31 d2 xor%edx,%edx 808db36f: f7 77 40divl 0x40(%rdi) Alex