Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
In article <24279.1548106...@splode.eterna.com.au>, matthew green wrote: >> > @@ -472,6 +472,9 @@ >> >const char *bootname = device_xname(bdv); >> >size_t len = strlen(bootname); >> > >> > + if (bdv == NULL) >> > + return 0; >> > + >> >> This looked suspicious, even before I read the code. >> >> The question is if it is ever legitimate for bdv to be NULL. > >infact, bdv has already been dereferenced at this point: >the assignment to bootname 3 lines up. Yes, that change should be reverted. christos
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
On Tue, Jan 22, 2019 at 08:32:48AM +1100, matthew green wrote: > > > @@ -472,6 +472,9 @@ > > > const char *bootname = device_xname(bdv); > > > size_t len = strlen(bootname); > > > > > > + if (bdv == NULL) > > > + return 0; > > > + > > > > This looked suspicious, even before I read the code. > > > > The question is if it is ever legitimate for bdv to be NULL. > > infact, bdv has already been dereferenced at this point: > the assignment to bootname 3 lines up. Argh. That's what I get for first removing the code I added which correctly moved the initialization of bootname and len to after the bdv==NULL check and the re-adding it at midnight. --chris
Re: Audio device mmap and kevents
> It's better to use write(2) for this purpose. Often, yes, but not, I think, always. > [...], mmap'ing audio device does not lead to improve any performance > or latency. Then something is wrong. (a) It's the rare port on which copying to mmap()ped memory is no faster than a user/kernel crossing plus a copyin. (b) Once bytes are write()ten, userland cannot change them. If the kernel does a lot of buffering, userland has to know the data it wants played a significant time in advance; if the kernel does little buffering, userland has a relatively hard real-time constraint in that it has to write() on a tight schedule or output audio drops out. Writing into an mmap()ped ring buffer takes advantage of the speed differential and fixes the latency issue without requiring real-time behaviour out of userland. (The API would need fixing, though; having to do a syscall to find out where the audio is playing defeats at least some of the performance advantage. Those numbers should appear in the mmap()ped buffer, or perhaps a different mmap()ped buffer.) Of course, if you're just playing a canned audio data stream and doing nothing else, the latency issue doesn't matter; you can have the kernel do lots of buffering and you're happy. But things such as games that want to play audio but also want to be able make that audio react very fast (for human values of "very fast") to unpredictable events may be willing to accept the inconvenience of an mmap()ped ring buffer for the sake of getting both lots of buffering and low latency. Also, if you have a repeating audio track, then at least in principle you can just copy it into the ring buffer and forget it, rather than having to constantly write() - though, of course, this depends on your repeat length dividing the ring buffer size. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
> >>> @@ -472,6 +472,9 @@ > >>> const char *bootname = device_xname(bdv); > >>> size_t len = strlen(bootname); > >>> > >>> + if (bdv == NULL) > >>> + return 0; > >>> + > >> > >> This looked suspicious, even before I read the code. > >> > >> The question is if it is ever legitimate for bdv to be NULL. > > > > infact, bdv has already been dereferenced at this point: > > the assignment to bootname 3 lines up. > > So, if we're going to insert a KASSERT() we would need to separate > the assignment from bootname's declaration, and do the assignment > after the KASSERT() i am not a huge fan of KASSERT(ptr != NULL). page fault is pretty much just as clear as the assert, and does not need special code to handle it. that said, i do see some utility in it as a guide to others reading this code, and won't object to this usage. .mrg.
re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
On Tue, 22 Jan 2019, matthew green wrote: @@ -472,6 +472,9 @@ const char *bootname = device_xname(bdv); size_t len = strlen(bootname); + if (bdv == NULL) + return 0; + This looked suspicious, even before I read the code. The question is if it is ever legitimate for bdv to be NULL. infact, bdv has already been dereferenced at this point: the assignment to bootname 3 lines up. So, if we're going to insert a KASSERT() we would need to separate the assignment from bootname's declaration, and do the assignment after the KASSERT() +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
Christoph Badura writes: > On Mon, Jan 21, 2019 at 04:24:49PM -0500, Greg Troxel wrote: >> Separetaly from debug code being careful, if it's a rule that bdv can't >> be NULL, it's just as well to put in a KASSERT. Then we'll find out >> where that isn't true and can fix it. > > I must not be getting something. If rf_containsboot() is passed a NULL > pointer, it will trap with a page fault and we can get a stacktrace from > ddb. If we add a KASSERT it will panic and we can get a stacktrace from > ddb. I don't see where the benefit in that is. The benefit is that the panic from the KASSERT is cleaner, and it documents for readers of the function that the author believes it is a rule. And it will definitely fault even on machines will can dereference NULL - that is technically if not practically architecture dependent. > Do you think we should add a KASSERT to document that rf_containsboot() > does expect a valid pointer? I'd see value in that and would go ahead with > it. Yes. Basically, in any kernel function, if there is a requirement that a pointer be non-NULL, then there should be a KASSERT and the code should then feel free to assume it is valid. When a KASSERT is hit, the user gets a message with the KASSERT expression and the source file/line, instead of a page fault traceback. It's very easy and quick to go from that printout to the KASSERT that failed. Plus, adding the KASSERT, or talking about adding it, is a good way to check if there is consensus among the other developers that this really is a rule. In NetBSD, people are really good at telling you you're wrong!
re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
> > @@ -472,6 +472,9 @@ > > const char *bootname = device_xname(bdv); > > size_t len = strlen(bootname); > > > > + if (bdv == NULL) > > + return 0; > > + > > This looked suspicious, even before I read the code. > > The question is if it is ever legitimate for bdv to be NULL. infact, bdv has already been dereferenced at this point: the assignment to bootname 3 lines up. .mrg.
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
On Mon, Jan 21, 2019 at 04:24:49PM -0500, Greg Troxel wrote: > Separetaly from debug code being careful, if it's a rule that bdv can't > be NULL, it's just as well to put in a KASSERT. Then we'll find out > where that isn't true and can fix it. I must not be getting something. If rf_containsboot() is passed a NULL pointer, it will trap with a page fault and we can get a stacktrace from ddb. If we add a KASSERT it will panic and we can get a stacktrace from ddb. I don't see where the benefit in that is. Do you think we should add a KASSERT to document that rf_containsboot() does expect a valid pointer? I'd see value in that and would go ahead with it. --chris
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
Christoph Badura writes: >> > + if (bdv == NULL) >> > + return 0; >> > + >> >> This looked suspicious, even before I read the code. >> The question is if it is ever legitimate for bdv to be NULL. > > That is an excellent point. The short answer is, no it isn't. And it > never was NULL in the code that used it. I got a trap into ddb because of > a null pointer deref in the DPRINTF that I changed (in the 4th hunk of my > patch). > >> I am a fan of having comments before every function declaring their >> preconditions and what they guarantee on exit. Then all uses can be >> audited if they guarantee the the preconditions are true. This approach >> is really hard-core in eiffel, known as design by contract. > > Yes, I totally agree. Also to the rest of your message that I didn't quote. > > When I prepared the patch yesterday I was about to delete the above change > because at first I couldn't remember why I added it ~3 weeks ago. That > should have raised a big fat warning sign. > > I thought about adding a comment after I read your private mail > earlier today. In the end I decided it is better to not change > rf_containsboot() and instead introduce a wrapper for the benefit of the > DPRINTF. Separetaly from debug code being careful, if it's a rule that bdv can't be NULL, it's just as well to put in a KASSERT. Then we'll find out where that isn't true and can fix it.
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
On Mon, Jan 21, 2019 at 06:36:37PM +0100, Christoph Badura wrote: > I think the following is better. Compile-tested only for both #ifdef > conditions, but I think that is OK. Ugh. I forgot to put a comment on that function. How about this: /* * Provide a wrapper around rf_containsboot that handles NULL pointers * gradefully. For use in DPRINTF(). */ > Index: rf_netbsdkintf.c > === > RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v > retrieving revision 1.356 > diff -u -r1.356 rf_netbsdkintf.c > --- rf_netbsdkintf.c 23 Jan 2018 22:42:29 - 1.356 > +++ rf_netbsdkintf.c 21 Jan 2019 15:01:24 - > @@ -491,6 +491,15 @@ > return 0; > } > > +#ifdef DEBUG_ROOT > +static int > +debug_rf_containsboot(RF_Raid_t *r, device_t bdv) { > + if (bdv == NULL) > + return 0; > + return rf_containsboot(r, bdv); > +} > +#endif > + --chris
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
On Mon, Jan 21, 2019 at 10:42:10AM -0500, Greg Troxel wrote: > Christoph Badura writes: > > Index: rf_netbsdkintf.c > > === > > RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v > > retrieving revision 1.356 > > diff -u -r1.356 rf_netbsdkintf.c > > --- rf_netbsdkintf.c23 Jan 2018 22:42:29 - 1.356 > > +++ rf_netbsdkintf.c20 Jan 2019 22:32:14 - > > @@ -472,6 +472,9 @@ > > const char *bootname = device_xname(bdv); > > size_t len = strlen(bootname); > > > > + if (bdv == NULL) > > + return 0; > > + > > for (int col = 0; col < r->numCol; col++) { > > const char *devname = r->Disks[col].devname; > > devname += sizeof("/dev/") - 1; > > This looked suspicious, even before I read the code. > The question is if it is ever legitimate for bdv to be NULL. That is an excellent point. The short answer is, no it isn't. And it never was NULL in the code that used it. I got a trap into ddb because of a null pointer deref in the DPRINTF that I changed (in the 4th hunk of my patch). > I am a fan of having comments before every function declaring their > preconditions and what they guarantee on exit. Then all uses can be > audited if they guarantee the the preconditions are true. This approach > is really hard-core in eiffel, known as design by contract. Yes, I totally agree. Also to the rest of your message that I didn't quote. When I prepared the patch yesterday I was about to delete the above change because at first I couldn't remember why I added it ~3 weeks ago. That should have raised a big fat warning sign. I thought about adding a comment after I read your private mail earlier today. In the end I decided it is better to not change rf_containsboot() and instead introduce a wrapper for the benefit of the DPRINTF. I think the following is better. Compile-tested only for both #ifdef conditions, but I think that is OK. Index: rf_netbsdkintf.c === RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v retrieving revision 1.356 diff -u -r1.356 rf_netbsdkintf.c --- rf_netbsdkintf.c23 Jan 2018 22:42:29 - 1.356 +++ rf_netbsdkintf.c21 Jan 2019 15:01:24 - @@ -491,6 +491,15 @@ return 0; } +#ifdef DEBUG_ROOT +static int +debug_rf_containsboot(RF_Raid_t *r, device_t bdv) { + if (bdv == NULL) + return 0; + return rf_containsboot(r, bdv); +} +#endif + void rf_buildroothack(RF_ConfigSet_t *config_sets) { @@ -509,8 +518,8 @@ cset->ac->clabel->autoconfigure == 1) { sc = rf_auto_config_set(cset); if (sc != NULL) { - aprint_debug("raid%d: configured ok\n", - sc->sc_unit); + aprint_debug("raid%d: configured ok, rootable %d\n", + sc->sc_unit, cset->rootable); if (cset->rootable) { rsc = sc; num_root++; @@ -534,8 +543,10 @@ /* if the user has specified what the root device should be then we don't touch booted_device or boothowto... */ - if (rootspec != NULL) + if (rootspec != NULL) { + DPRINTF("%s: rootspec %s\n", __func__, rootspec); return; + } /* we found something bootable... */ @@ -577,9 +588,9 @@ candidate_root = dksc->sc_dev; DPRINTF("%s: candidate root=%p\n", __func__, candidate_root); DPRINTF("%s: booted_device=%p root_partition=%d " - "contains_boot=%d\n", __func__, booted_device, - rsc->sc_r.root_partition, - rf_containsboot(>sc_r, booted_device)); + "contains_boot=%d", + __func__, booted_device, rsc->sc_r.root_partition, + debug_rf_containsboot(>sc_r, booted_device)); if (booted_device == NULL || rsc->sc_r.root_partition == 1 || rf_containsboot(>sc_r, booted_device)) {
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
Christoph Badura writes: > Here is some instrumentation I found useful during my recent debugging. > If there are no objections, I'd like to commit soon. > > The change to rf_containsroot() simplifies the second DPRINTF that I added. > > Index: rf_netbsdkintf.c > === > RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v > retrieving revision 1.356 > diff -u -r1.356 rf_netbsdkintf.c > --- rf_netbsdkintf.c 23 Jan 2018 22:42:29 - 1.356 > +++ rf_netbsdkintf.c 20 Jan 2019 22:32:14 - > @@ -472,6 +472,9 @@ > const char *bootname = device_xname(bdv); > size_t len = strlen(bootname); > > + if (bdv == NULL) > + return 0; > + > for (int col = 0; col < r->numCol; col++) { > const char *devname = r->Disks[col].devname; > devname += sizeof("/dev/") - 1; This looked suspicious, even before I read the code. The question is if it is ever legitimate for bdv to be NULL. I am a fan of having comments before every function declaring their preconditions and what they guarantee on exit. Then all uses can be audited if they guarantee the the preconditions are true. This approach is really hard-core in eiffel, known as design by contract. In NetBSD, many functions have KASSERT at the beginning. This checks them (under DIAGNOSTIC) but it also is a way of documenting the rules. >From a quick glance at the code it seems obvious that it's not ok to call these functions with a NULL bdv. So if bdv is an argument and not allowed to be NULL, then early on in that function, where you check/return, there should be KASSERT(bdv != NULL) Not really on point, but as a caution there should be no behavior change in any function under DIAGNOSTIC, if the code is bug free and preconditions are met. So "if something we can rely on isn't true, panic" is fine, but many other things rae not.
Re: XEN3_DOM0 cpu_bootconf() and bootdev=xxx failure
b...@bsd.de (Christoph Badura) writes: >> rootspec is set by config(1) for three cases: >> config root on major 4 minor 4 >> - sets rootspec to "sd0e" (or maybe "<4/4>") and rootdev to makedev(4,4) >> config root on sd0e >> - sets rootspec to "sd0e" and rootdev to makedev(4,4) >> config root on "some string constant" >> - sets rootspec to "some string constant" and rootdev to NODEV. >There is also: >config root on ? >- sets rootspec to NULL and rootdev to NODEV >and the interesting case of a network device: >config on wm0 >- sets rootspec to "wm0" and rootdev to NODEV explicitly. Yes, I listed the cases where rootspec is set to non-NULL since we talk about it being augmented by bootspec. As for a network interface, there is no major number, so yes, rootdev can only be set to NODEV and the interface must be passed as rootspec, so in the end it is treated like the third case. >Also, one has to use a spec string to put root on dk(4) or a wedge name. >The former case is because dk(4)s don't exist at config(1) time. Yes, thanks to how dk(4) is configured. However, you can specify it by major/minor and get: const char *rootspec = "dk0a"; dev_t rootdev = makedev(168, 0); /* dk0a */ Could even work. >> If rootspec is set the kernel checks for >> - rootspec specifying a network interface -> network boot >> - rootdev == NODEV -> resolve rootspec (as a wedge name) >> - rootdev != NODEV -> Use rootdev, ignore rootspec but which is same as >> rootdev >Can you give me the line numbers in kern_subr.c where you think those >decisions are made? I can't find the code in question. rootdev is never >used in a comparison in kern_subr.c. Check the 'root on ' path (netbsd-8 source for reference): the network check is line 428. the rootspec for disks is used in line 433. the following code uses rootdev (which is equivalent to rootspec). Obviously you don't see a direct comparison of rootdev. Line 433 checks for rootspec being "dk" (originally, it now also includes "flash"), as you've written above that catches the wedge case with rootdev = NODEV. >> The intention is to resolve the 3 cases (network interface, disk device, >> wedge name), and rootspec basically overrides rootdev (in the second case >> they specify the same device, so nothing is "overridden" even when the >> code takes the rootdev value). >I'm unclear what the you mean by "in the second case". second case in my list is 'config root on sd0e'. >> >I.e. rootspec and rootdev need to be set in pairs. >> >> If rootspec isn't set by config(1), we talk about a wildcard boot and >> rootdev is already set to NODEV. So that's exactly the third case provided >> by config(1). And if I'm not mistaken then >> config root on "raid0" >> doesn't work either. >Probably. My proposed change makes that work, though. Indeed, but it also adds another place where rootspec is evaluated. >The idea in that change is that if parsedisk() can resolve the device in >bootspec, that we simulate what "config root on bootspec" would >have done. That does handle the dk(4) and wedge name cases too, because >these devices now exist. config root on "bootspec" sets rootdev to NODEV. >> So we could >> - explicitely clear rootdev to NODEV to not rely on config(1) defaults >> when using bootspec. >That doesn't work because then rootdev would never get set and it has to >be set after you get to haveroot:. if (rootspec == NULL) { rootspec = bootspec; rootdev = NODEV; } works fine. rootspec being NULL is the wildcard case and setting rootdev to NODEV is a no-op (assuming config works as it works now). >> - make the code that resolves rootspec also handle regular disks and >> not just dk (and flash) devices. This also keeps that logic in a >> single place. >I have thought about why > if (dv != NULL && device_class(dv) == DV_DISK && > !DEV_USES_PARTITIONS(dv) && > (majdev = devsw_name2blk(device_xname(dv), NULL, 0)) >= 0) { > rootdv = dv; > rootdev = makedev(majdev, device_unit(dv)); > goto haveroot; > } >doesn't do the same thing as > if (DEV_USES_PARTITIONS(bootdv)) > rootdev = MAKEDISKDEV(majdev, > device_unit(bootdv), > bootpartition); > else > rootdev = makedev(majdev, device_unit(bootdv)); >in the rootspec == NULL case. It skips the case where the device is a regular disk with partitions. >That is another candidate for a likely fix. See my patch. http://ftp.netbsd.org/pub/NetBSD/misc/mlelstv/kern_subr.diff >Anyway, I see nothing wrong on a technical level with my change. >There's a lot to clean up and actually factor out related to setroot() >anyway. I don't like that it adds another