Re: cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Michael van Elst
On Wed, Sep 14, 2016 at 07:18:56AM +0100, Alexander Nasonov wrote:
> Michael van Elst wrote:
> > Ah, maybe then:
> > 
> > --- cgd.c   5 Aug 2016 08:24:46 -   1.110
> > +++ cgd.c   13 Sep 2016 21:43:27 -
> > @@ -305,13 +305,17 @@
> >  static void
> >  cgdstrategy(struct buf *bp)
> >  {
> > -   struct  cgd_softc *cs = getcgd_softc(bp->b_dev);
> > -   struct  dk_softc *dksc = &cs->sc_dksc;
> > -   struct  disk_geom *dg = &dksc->sc_dkdev.dk_geom;
> > +   struct  cgd_softc *cs;
> > +   struct  dk_softc *dksc;
> > +   struct  disk_geom *dg;
> >  
> > DPRINTF_FOLLOW(("cgdstrategy(%p): b_bcount = %ld\n", bp,
> > (long)bp->b_bcount));
> >  
> > +   GETCGD_SOFTC(cs, bp->b_dev);
> > +   dksc = &cs->sc_dksc;
> > +   dg = &dksc->sc_dkdev.dk_geom;
> > +
> 
> It will not compile because cgdstrategy() returns void.


Right. This needs to be written differently. Instead of GETCGD_SOFTC()
use:

cs = getcgd_softc(bp->b_dev);
if (!cs) {
bp->b_error = ENXIO;
biodone(bp);
return;
}

-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Alexander Nasonov
Michael van Elst wrote:
> Ah, maybe then:
> 
> --- cgd.c   5 Aug 2016 08:24:46 -   1.110
> +++ cgd.c   13 Sep 2016 21:43:27 -
> @@ -305,13 +305,17 @@
>  static void
>  cgdstrategy(struct buf *bp)
>  {
> -   struct  cgd_softc *cs = getcgd_softc(bp->b_dev);
> -   struct  dk_softc *dksc = &cs->sc_dksc;
> -   struct  disk_geom *dg = &dksc->sc_dkdev.dk_geom;
> +   struct  cgd_softc *cs;
> +   struct  dk_softc *dksc;
> +   struct  disk_geom *dg;
>  
> DPRINTF_FOLLOW(("cgdstrategy(%p): b_bcount = %ld\n", bp,
> (long)bp->b_bcount));
>  
> +   GETCGD_SOFTC(cs, bp->b_dev);
> +   dksc = &cs->sc_dksc;
> +   dg = &dksc->sc_dkdev.dk_geom;
> +

It will not compile because cgdstrategy() returns void.

Alex


Re: Move kern_ctf to the dtrace module?

2016-09-13 Thread Paul Goyette
It was suggested to me in private Email that perhaps the file 
sys/kern/kern_ctf.c should be moved out of src/sys and into "the 
module's [source] directory."


I'd rather not move the file into src/external//dtrace/dev/fbt/ 
because kern_ctf.c and the existing file both include some headers with 
the same name, but expecting different files to be included!  (dtrace 
provides its own versions of sys/kobj.h and sys/kobj_impl.h headers.)


We generally don't put actual source files into src/sys/modules/* but it 
might be appropriate in this case.


Either way, I'm not really happy about using file-specific CPP_FLAGS in 
sys/modules/dtrace/fbt/Makefile but the only other way to make this work 
would be to create a dtrace_fbt_ctf module, on which dtrace_fbt would 
depend.  This introduces an additional layer of module-autoload, and 
would occupy a slightly larger in-memory footprint, but should not have 
any other impact.


So, questions to be answered:

* Where should sys/kern/kern_ctf.c reside?
* Should kern_ctf.c be a separate (sub)module?

Any thoughts?

On Tue, 13 Sep 2016, Paul Goyette wrote:


On Mon, 12 Sep 2016, Paul Goyette wrote:

Currently, if you include KDTRACE_HOOKS in your kernel, we automatically 
include src/sys/kern/kern_ctf.[co] in the resulting kernel.


kern_ctf defines only one global symbol, mod_ctf_get(), according to the 
generated netbsd.map file, and there are no references anywhere else in the 
kernel to this routine;  the only reference is within the sub-module 
dtrace_fbt.kmod


I would like to suggest that kern_ctf be removed from the kernel, and added 
to the dtrace_fbt module.  And, since kern_ctf references stuff in 
net/zlib.c, the dtrace_fbt module should depend on zlib module.  And 
finally, the ktrace_hooks dependency can be removed from the zlib.c line in 
net/files.net


The bottom line is that zlib code won't be included in kernels that don't 
need it, yet will still be available when using dtrace (which is only 
available as a module).



If I don't see any significant objections, I'll post diffs in a day or so 
before making any commits.


Diffs are attached here.

I plan to commit within the next couple of days...



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++


+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++


Re: cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Michael van Elst
On Tue, Sep 13, 2016 at 09:27:11PM +0100, Alexander Nasonov wrote:
> 
> I can reproduce division by zero but not when rebooting. If I take
> an unconfigured cgd device, i.e. cgd2 and run
> 
>   mount /dev/cgd2d /tmp
> 
> the kernel will panic instead of returning ENXIO.

Ah, maybe then:

--- cgd.c   5 Aug 2016 08:24:46 -   1.110
+++ cgd.c   13 Sep 2016 21:43:27 -
@@ -305,13 +305,17 @@
 static void
 cgdstrategy(struct buf *bp)
 {
-   struct  cgd_softc *cs = getcgd_softc(bp->b_dev);
-   struct  dk_softc *dksc = &cs->sc_dksc;
-   struct  disk_geom *dg = &dksc->sc_dkdev.dk_geom;
+   struct  cgd_softc *cs;
+   struct  dk_softc *dksc;
+   struct  disk_geom *dg;
 
DPRINTF_FOLLOW(("cgdstrategy(%p): b_bcount = %ld\n", bp,
(long)bp->b_bcount));
 
+   GETCGD_SOFTC(cs, bp->b_dev);
+   dksc = &cs->sc_dksc;
+   dg = &dksc->sc_dkdev.dk_geom;
+
/*
 * Reject unaligned writes.  We can encrypt and decrypt only
 * complete disk sectors, and we let the ciphers require their



Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Alexander Nasonov
Alexander Nasonov wrote:
> I can examine data at 0x34 offset and it's indeed 0.

Correction: $rdi+0x40 is the correct location.

I also inspected low half of dev_t ($rbx+0x38) and its value was 0x1423
which corresponds to:

brw-r-  1 root  operator  20, 35 Dec 13  2015 /dev/cgd2d


Alex


Re: cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Alexander Nasonov
Michael van Elst wrote:
> That would require dg_secsize to be 0 which is difficult to do
> because the drivers initialize the value and the common disk_set_info()
> function fixes up a zero value.

I can reproduce division by zero but not when rebooting. If I take
an unconfigured cgd device, i.e. cgd2 and run

mount /dev/cgd2d /tmp

the kernel will panic instead of returning ENXIO.

> But maybe the dg pointer is bad? Please have a look at the %rdi
> register.

I don't know what was rdi's value when it crashed during reboot but
crashes when mounting /dev/cgd2d all have good kernel-space values.
I can examine data at 0x34 offset and it's indeed 0.

$ crash -M netbsd.12.core
Crash version 7.99.36, image version 7.99.36.
System panicked: dump forced via kernel debugger
Backtrace from time of crash is available.
crash> dmesg|tail
iwm0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36M
bps 48Mbps 54Mbps
acpibat0: normal capacity on 'charge state'
fatal integer divide fault in supervisor mode
trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 4d8000 ilevel 0 rs
p fe8116cfba50
curlwp 0xfe836fcbab00 pid 13.1 lowest kstack 0xfe8116cf82c0

dumping to dev 20,17 (offset=212951, size=3119109):
dump
crash> bt
_KERNEL_OPT_NARCNET() at 0
_KERNEL_OPT_NARCNET() at 0
db_reboot_cmd() at db_reboot_cmd
db_command() at db_command+0xeb
db_command_loop() at db_command_loop+0x90
db_trap() at db_trap+0xe3
kdb_trap() at kdb_trap+0xe1
trap() at trap+0x574
--- trap (number 8) ---
cgdstrategy() at cgdstrategy+0x26
bdev_strategy() at bdev_strategy+0x68
spec_strategy() at spec_strategy+0x81
VOP_STRATEGY() at VOP_STRATEGY+0x33
bio_doread() at bio_doread+0x98
bread() at bread+0x1a
ffs_mountfs() at ffs_mountfs+0x170
ffs_mount() at ffs_mount+0x227
VFS_MOUNT() at VFS_MOUNT+0x34
mount_domount() at mount_domount+0x122
do_sys_mount() at do_sys_mount+0x20f
sys___mount50() at sys___mount50+0x33
syscall() at syscall+0x15b
--- syscall (number 410) ---
75c7da:
crash> ps
PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
13   >   1 7   1 0   fe836fcbab00  mount_ffs
12   1 2   1   802   fe811681d2a0  mount
81 2   1   802   fe811681d6c0ksh
21 2   1   802   fe811681dae0ksh
11 2   1   802   fe81163f5680   init
...

Alex


Re: cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Michael van Elst
al...@yandex.ru (Alexander Nasonov) writes:

>All mount/umount worked but when I ran reboot, the system trapped here:

>fatal integer divide fault in supervisor mode
>trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 efd...
>curlwp 0xfe81163b4a40 pid 276.1 lowest kstack 0xfe8117343...
>kernel: integer divide fault trap, code=0
>Stopped in pid 276.1 (reboot) atnetbsd:cgdstrategy+0x26:

>if (bp->b_blkno < 0 ||
>(bp->b_bcount % dg->dg_secsize) != 0 ||

>808db36b:   89 c8   mov%ecx,%eax
>808db36d:   31 d2   xor%edx,%edx
>808db36f:   f7 77 40divl   0x40(%rdi)


That would require dg_secsize to be 0 which is difficult to do
because the drivers initialize the value and the common disk_set_info()
function fixes up a zero value.

But maybe the dg pointer is bad? Please have a look at the %rdi
register.

N.B. there are some rare failure paths in getcgd_softc() that would
return a NULL pointer that isn't checked. If the kernel maps zeros
at NULL this could trigger a divide error here.


-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Alexander Nasonov
Someone warned me that adding cgd to dump devices can have bad
consequences. I think I caught one possible bad case yesterday.
I was lucky enough to still have my data.

My setup is quite complicated. I have a small root on wd0a which
does only one thing: to mount a real root on cgd0a and chroot to
it. The rest of the system is on cgd1.

I was in a single-user mode, inside /altroot (iirc), all fs mounted
but I wanted to remount them in read-only mode. Some of them couldn't
be unmounted and I forced umounts with the -f flag. Then I mounted them
with read-only flag. I don't remember exact commands but I have nested
mount points, e.g. /var/log inside /var and I was definitely trying to
remount both inner and outer fs.

All mount/umount worked but when I ran reboot, the system trapped here:

fatal integer divide fault in supervisor mode
trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 efd...
curlwp 0xfe81163b4a40 pid 276.1 lowest kstack 0xfe8117343...
kernel: integer divide fault trap, code=0
Stopped in pid 276.1 (reboot) atnetbsd:cgdstrategy+0x26:
4
0(%rdi),%eax

This it what I run:

NetBSD neva 7.99.36 NetBSD 7.99.36 (GENERIC) #0: Fri Sep  2 22:04:02 BST 2016  
alnsn@nebeda:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC
 amd64

Sources checked out on Sep 2.

Looking at the assembly, it appears that the fault happened at the
second line of this branch:

if (bp->b_blkno < 0 ||
(bp->b_bcount % dg->dg_secsize) != 0 ||

(offset of b_blkno is 0x48, b_bcount's offset is 0x34).

808db349 :
808db349:   55  push   %rbp
808db34a:   48 89 e5mov%rsp,%rbp
808db34d:   53  push   %rbx
808db34e:   48 83 ec 08 sub$0x8,%rsp
808db352:   48 89 fbmov%rdi,%rbx
808db355:   48 8b 7f 38 mov0x38(%rdi),%rdi
808db359:   e8 4d fe ff ff  callq  808db1ab 
808db35e:   48 83 7b 48 00  cmpq   $0x0,0x48(%rbx)
808db363:   78 3d   js 808db3a2 
808db365:   48 89 c7mov%rax,%rdi
808db368:   8b 4b 34mov0x34(%rbx),%ecx
808db36b:   89 c8   mov%ecx,%eax
808db36d:   31 d2   xor%edx,%edx
808db36f:   f7 77 40divl   0x40(%rdi)

Alex