Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-22 Thread Christos Zoulas
In article <20190122173641.ga26...@irregular-apocalypse.k.bsd.de>,
Christoph Badura   wrote:
>On Mon, Jan 21, 2019 at 11:44:04PM +0100, Christoph Badura wrote:
>> On Tue, Jan 22, 2019 at 08:32:48AM +1100, matthew green wrote:
>> > > > @@ -472,6 +472,9 @@
>> > > >const char *bootname = device_xname(bdv);
>> > > >size_t len = strlen(bootname);
>> > > >  
>> > > > +  if (bdv == NULL)
>> > > > +  return 0;
>> > > > +
>> > > 
>> > > This looked suspicious, even before I read the code.
>> > > 
>> > > The question is if it is ever legitimate for bdv to be NULL.
>> > 
>> > infact, bdv has already been dereferenced at this point:
>> > the assignment to bootname 3 lines up.
>> 
>> Argh.  That's what I get for first removing the code I added which
>> correctly moved the initialization of bootname and len to after the
>> bdv==NULL check and the re-adding it at midnight.
>
>Take 3.  Back to what I intended to submit in the first place.
>
>booted_device is NULL when booting a Xen kernel.  In that case it is OK
>for rf_containsboot() to return 0 because the raid set can't contain "it".
>This matches the checkes for booted_device == NULL near the two
>invocations of rf_containsboot().
>
>I think the check for booted_device == NULL in the last hunk can be
>omitted, but I can't easily test right now.  So I marked it XXX and would
>leave it for later.

LGTM.

christos



Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-22 Thread Christoph Badura
On Mon, Jan 21, 2019 at 11:44:04PM +0100, Christoph Badura wrote:
> On Tue, Jan 22, 2019 at 08:32:48AM +1100, matthew green wrote:
> > > > @@ -472,6 +472,9 @@
> > > > const char *bootname = device_xname(bdv);
> > > > size_t len = strlen(bootname);
> > > >  
> > > > +   if (bdv == NULL)
> > > > +   return 0;
> > > > +
> > > 
> > > This looked suspicious, even before I read the code.
> > > 
> > > The question is if it is ever legitimate for bdv to be NULL.
> > 
> > infact, bdv has already been dereferenced at this point:
> > the assignment to bootname 3 lines up.
> 
> Argh.  That's what I get for first removing the code I added which
> correctly moved the initialization of bootname and len to after the
> bdv==NULL check and the re-adding it at midnight.

Take 3.  Back to what I intended to submit in the first place.

booted_device is NULL when booting a Xen kernel.  In that case it is OK
for rf_containsboot() to return 0 because the raid set can't contain "it".
This matches the checkes for booted_device == NULL near the two
invocations of rf_containsboot().

I think the check for booted_device == NULL in the last hunk can be
omitted, but I can't easily test right now.  So I marked it XXX and would
leave it for later.

--chris

Index: rf_netbsdkintf.c
===
RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
retrieving revision 1.356
diff -u -r1.356 rf_netbsdkintf.c
--- rf_netbsdkintf.c23 Jan 2018 22:42:29 -  1.356
+++ rf_netbsdkintf.c22 Jan 2019 17:30:43 -
@@ -469,8 +469,15 @@
 
 static int
 rf_containsboot(RF_Raid_t *r, device_t bdv) {
-   const char *bootname = device_xname(bdv);
-   size_t len = strlen(bootname);
+   const char *bootname;
+   size_t len;
+
+   /* if bdv is NULL, the set can't contain it. exit early. */
+   if (bdv == NULL)
+   return 0;
+
+   bootname = device_xname(bdv);
+   len = strlen(bootname);
 
for (int col = 0; col < r->numCol; col++) {
const char *devname = r->Disks[col].devname;
@@ -509,8 +516,8 @@
cset->ac->clabel->autoconfigure == 1) {
sc = rf_auto_config_set(cset);
if (sc != NULL) {
-   aprint_debug("raid%d: configured ok\n",
-   sc->sc_unit);
+   aprint_debug("raid%d: configured ok, rootable 
%d\n",
+   sc->sc_unit, cset->rootable);
if (cset->rootable) {
rsc = sc;
num_root++;
@@ -534,8 +541,10 @@
/* if the user has specified what the root device should be
   then we don't touch booted_device or boothowto... */
 
-   if (rootspec != NULL)
+   if (rootspec != NULL) {
+   DPRINTF("%s: rootspec %s\n", __func__, rootspec);
return;
+   }
 
/* we found something bootable... */
 
@@ -577,9 +586,12 @@
candidate_root = dksc->sc_dev;
DPRINTF("%s: candidate root=%p\n", __func__, candidate_root);
DPRINTF("%s: booted_device=%p root_partition=%d "
-  "contains_boot=%d\n", __func__, booted_device,
-  rsc->sc_r.root_partition,
-  rf_containsboot(>sc_r, booted_device));
+   "contains_boot=%d",
+   __func__, booted_device, rsc->sc_r.root_partition,
+  rf_containsboot(>sc_r, booted_device));
+   /* XXX the check for booted_device == NULL can probably be
+* dropped, now that rf_containsboot handles that case.
+*/
if (booted_device == NULL ||
rsc->sc_r.root_partition == 1 ||
rf_containsboot(>sc_r, booted_device)) {


Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Christos Zoulas
In article <24279.1548106...@splode.eterna.com.au>,
matthew green   wrote:
>> > @@ -472,6 +472,9 @@
>> >const char *bootname = device_xname(bdv);
>> >size_t len = strlen(bootname);
>> >  
>> > +  if (bdv == NULL)
>> > +  return 0;
>> > +
>> 
>> This looked suspicious, even before I read the code.
>> 
>> The question is if it is ever legitimate for bdv to be NULL.
>
>infact, bdv has already been dereferenced at this point:
>the assignment to bootname 3 lines up.

Yes, that change should be reverted.

christos



Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Christoph Badura
On Tue, Jan 22, 2019 at 08:32:48AM +1100, matthew green wrote:
> > > @@ -472,6 +472,9 @@
> > >   const char *bootname = device_xname(bdv);
> > >   size_t len = strlen(bootname);
> > >  
> > > + if (bdv == NULL)
> > > + return 0;
> > > +
> > 
> > This looked suspicious, even before I read the code.
> > 
> > The question is if it is ever legitimate for bdv to be NULL.
> 
> infact, bdv has already been dereferenced at this point:
> the assignment to bootname 3 lines up.

Argh.  That's what I get for first removing the code I added which
correctly moved the initialization of bootname and len to after the
bdv==NULL check and the re-adding it at midnight.

--chris


re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread matthew green
> >>> @@ -472,6 +472,9 @@
> >>>   const char *bootname = device_xname(bdv);
> >>>   size_t len = strlen(bootname);
> >>>
> >>> + if (bdv == NULL)
> >>> + return 0;
> >>> +
> >>
> >> This looked suspicious, even before I read the code.
> >>
> >> The question is if it is ever legitimate for bdv to be NULL.
> >
> > infact, bdv has already been dereferenced at this point:
> > the assignment to bootname 3 lines up.
> 
> So, if we're going to insert a KASSERT() we would need to separate
> the assignment from bootname's declaration, and do the assignment
> after the KASSERT()

i am not a huge fan of KASSERT(ptr != NULL).  page fault is
pretty much just as clear as the assert, and does not need
special code to handle it.

that said, i do see some utility in it as a guide to others
reading this code, and won't object to this usage.


.mrg.


re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Paul Goyette

On Tue, 22 Jan 2019, matthew green wrote:


@@ -472,6 +472,9 @@
const char *bootname = device_xname(bdv);
size_t len = strlen(bootname);

+   if (bdv == NULL)
+   return 0;
+


This looked suspicious, even before I read the code.

The question is if it is ever legitimate for bdv to be NULL.


infact, bdv has already been dereferenced at this point:
the assignment to bootname 3 lines up.


So, if we're going to insert a KASSERT() we would need to separate
the assignment from bootname's declaration, and do the assignment
after the KASSERT()


+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Greg Troxel


Christoph Badura  writes:

> On Mon, Jan 21, 2019 at 04:24:49PM -0500, Greg Troxel wrote:
>> Separetaly from debug code being careful, if it's a rule that bdv can't
>> be NULL, it's just as well to put in a KASSERT.  Then we'll find out
>> where that isn't true and can fix it.
>
> I must not be getting something.  If rf_containsboot() is passed a NULL
> pointer, it will trap with a page fault and we can get a stacktrace from
> ddb.  If we add a KASSERT it will panic and we can get a stacktrace from
> ddb.  I don't see where the benefit in that is.

The benefit is that the panic from the KASSERT is cleaner, and it
documents for readers of the function that the author believes it is a
rule.   And it will definitely fault even on machines will can
dereference NULL - that is technically if not practically architecture 
dependent.

> Do you think we should add a KASSERT to document that rf_containsboot()
> does expect a valid pointer? I'd see value in that and would go ahead with
> it.

Yes.  Basically, in any kernel function, if there is a requirement that
a pointer be non-NULL, then there should be a KASSERT and the code
should then feel free to assume it is valid.

When a KASSERT is hit, the user gets a message with the KASSERT
expression and the source file/line, instead of a page fault traceback.
It's very easy and quick to go from that printout to the KASSERT that
failed.

Plus, adding the KASSERT, or talking about adding it, is a good way to
check if there is consensus among the other developers that this really
is a rule.  In NetBSD, people are really good at telling you you're
wrong!


re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread matthew green
> > @@ -472,6 +472,9 @@
> > const char *bootname = device_xname(bdv);
> > size_t len = strlen(bootname);
> >  
> > +   if (bdv == NULL)
> > +   return 0;
> > +
> 
> This looked suspicious, even before I read the code.
> 
> The question is if it is ever legitimate for bdv to be NULL.

infact, bdv has already been dereferenced at this point:
the assignment to bootname 3 lines up.


.mrg.


Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Christoph Badura
On Mon, Jan 21, 2019 at 04:24:49PM -0500, Greg Troxel wrote:
> Separetaly from debug code being careful, if it's a rule that bdv can't
> be NULL, it's just as well to put in a KASSERT.  Then we'll find out
> where that isn't true and can fix it.

I must not be getting something.  If rf_containsboot() is passed a NULL
pointer, it will trap with a page fault and we can get a stacktrace from
ddb.  If we add a KASSERT it will panic and we can get a stacktrace from
ddb.  I don't see where the benefit in that is.

Do you think we should add a KASSERT to document that rf_containsboot()
does expect a valid pointer? I'd see value in that and would go ahead with
it.

--chris


Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Greg Troxel
Christoph Badura  writes:

>> > +  if (bdv == NULL)
>> > +  return 0;
>> > +
>> 
>> This looked suspicious, even before I read the code.
>> The question is if it is ever legitimate for bdv to be NULL.
>
> That is an excellent point.  The short answer is, no it isn't.  And it
> never was NULL in the code that used it.  I got a trap into ddb because of
> a null pointer deref in the DPRINTF that I changed (in the 4th hunk of my
> patch).
>
>> I am a fan of having comments before every function declaring their
>> preconditions and what they guarantee on exit.  Then all uses can be
>> audited if they guarantee the the preconditions are true.  This approach
>> is really hard-core in eiffel, known as design by contract.
>
> Yes, I totally agree.  Also to the rest of your message that I didn't quote.
>
> When I prepared the patch yesterday I was about to delete the above change
> because at first I couldn't remember why I added it ~3 weeks ago.  That
> should have raised a big fat warning sign.
>
> I thought about adding a comment after I read your private mail
> earlier today.  In the end I decided it is better to not change
> rf_containsboot() and instead introduce a wrapper for the benefit of the
> DPRINTF.

Separetaly from debug code being careful, if it's a rule that bdv can't
be NULL, it's just as well to put in a KASSERT.  Then we'll find out
where that isn't true and can fix it.



Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Christoph Badura
On Mon, Jan 21, 2019 at 06:36:37PM +0100, Christoph Badura wrote:
> I think the following is better.  Compile-tested only for both #ifdef
> conditions, but I think that is OK.

Ugh. I forgot to put a comment on that function.  How about this:

/*
 * Provide a wrapper around rf_containsboot that handles NULL pointers
 * gradefully.  For use in DPRINTF().
 */

> Index: rf_netbsdkintf.c
> ===
> RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
> retrieving revision 1.356
> diff -u -r1.356 rf_netbsdkintf.c
> --- rf_netbsdkintf.c  23 Jan 2018 22:42:29 -  1.356
> +++ rf_netbsdkintf.c  21 Jan 2019 15:01:24 -
> @@ -491,6 +491,15 @@
>   return 0;
>  }
>  
> +#ifdef DEBUG_ROOT
> +static int
> +debug_rf_containsboot(RF_Raid_t *r, device_t bdv) {
> + if (bdv == NULL)
> + return 0;
> + return rf_containsboot(r, bdv);
> +}
> +#endif
> +

--chris


Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Christoph Badura
On Mon, Jan 21, 2019 at 10:42:10AM -0500, Greg Troxel wrote:
> Christoph Badura  writes:
> > Index: rf_netbsdkintf.c
> > ===
> > RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
> > retrieving revision 1.356
> > diff -u -r1.356 rf_netbsdkintf.c
> > --- rf_netbsdkintf.c23 Jan 2018 22:42:29 -  1.356
> > +++ rf_netbsdkintf.c20 Jan 2019 22:32:14 -
> > @@ -472,6 +472,9 @@
> > const char *bootname = device_xname(bdv);
> > size_t len = strlen(bootname);
> >  
> > +   if (bdv == NULL)
> > +   return 0;
> > +
> > for (int col = 0; col < r->numCol; col++) {
> > const char *devname = r->Disks[col].devname;
> > devname += sizeof("/dev/") - 1;
> 
> This looked suspicious, even before I read the code.
> The question is if it is ever legitimate for bdv to be NULL.

That is an excellent point.  The short answer is, no it isn't.  And it
never was NULL in the code that used it.  I got a trap into ddb because of
a null pointer deref in the DPRINTF that I changed (in the 4th hunk of my
patch).

> I am a fan of having comments before every function declaring their
> preconditions and what they guarantee on exit.  Then all uses can be
> audited if they guarantee the the preconditions are true.  This approach
> is really hard-core in eiffel, known as design by contract.

Yes, I totally agree.  Also to the rest of your message that I didn't quote.

When I prepared the patch yesterday I was about to delete the above change
because at first I couldn't remember why I added it ~3 weeks ago.  That
should have raised a big fat warning sign.

I thought about adding a comment after I read your private mail
earlier today.  In the end I decided it is better to not change
rf_containsboot() and instead introduce a wrapper for the benefit of the
DPRINTF.

I think the following is better.  Compile-tested only for both #ifdef
conditions, but I think that is OK.

Index: rf_netbsdkintf.c
===
RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
retrieving revision 1.356
diff -u -r1.356 rf_netbsdkintf.c
--- rf_netbsdkintf.c23 Jan 2018 22:42:29 -  1.356
+++ rf_netbsdkintf.c21 Jan 2019 15:01:24 -
@@ -491,6 +491,15 @@
return 0;
 }
 
+#ifdef DEBUG_ROOT
+static int
+debug_rf_containsboot(RF_Raid_t *r, device_t bdv) {
+   if (bdv == NULL)
+   return 0;
+   return rf_containsboot(r, bdv);
+}
+#endif
+
 void
 rf_buildroothack(RF_ConfigSet_t *config_sets)
 {
@@ -509,8 +518,8 @@
cset->ac->clabel->autoconfigure == 1) {
sc = rf_auto_config_set(cset);
if (sc != NULL) {
-   aprint_debug("raid%d: configured ok\n",
-   sc->sc_unit);
+   aprint_debug("raid%d: configured ok, rootable 
%d\n",
+   sc->sc_unit, cset->rootable);
if (cset->rootable) {
rsc = sc;
num_root++;
@@ -534,8 +543,10 @@
/* if the user has specified what the root device should be
   then we don't touch booted_device or boothowto... */
 
-   if (rootspec != NULL)
+   if (rootspec != NULL) {
+   DPRINTF("%s: rootspec %s\n", __func__, rootspec);
return;
+   }
 
/* we found something bootable... */
 
@@ -577,9 +588,9 @@
candidate_root = dksc->sc_dev;
DPRINTF("%s: candidate root=%p\n", __func__, candidate_root);
DPRINTF("%s: booted_device=%p root_partition=%d "
-  "contains_boot=%d\n", __func__, booted_device,
-  rsc->sc_r.root_partition,
-  rf_containsboot(>sc_r, booted_device));
+   "contains_boot=%d",
+   __func__, booted_device, rsc->sc_r.root_partition,
+  debug_rf_containsboot(>sc_r, booted_device));
if (booted_device == NULL ||
rsc->sc_r.root_partition == 1 ||
rf_containsboot(>sc_r, booted_device)) {


Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Greg Troxel
Christoph Badura  writes:

> Here is some instrumentation I found useful during my recent debugging.
> If there are no objections, I'd like to commit soon.
>
> The change to rf_containsroot() simplifies the second DPRINTF that I added.
>
> Index: rf_netbsdkintf.c
> ===
> RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
> retrieving revision 1.356
> diff -u -r1.356 rf_netbsdkintf.c
> --- rf_netbsdkintf.c  23 Jan 2018 22:42:29 -  1.356
> +++ rf_netbsdkintf.c  20 Jan 2019 22:32:14 -
> @@ -472,6 +472,9 @@
>   const char *bootname = device_xname(bdv);
>   size_t len = strlen(bootname);
>  
> + if (bdv == NULL)
> + return 0;
> +
>   for (int col = 0; col < r->numCol; col++) {
>   const char *devname = r->Disks[col].devname;
>   devname += sizeof("/dev/") - 1;

This looked suspicious, even before I read the code.

The question is if it is ever legitimate for bdv to be NULL.

I am a fan of having comments before every function declaring their
preconditions and what they guarantee on exit.  Then all uses can be
audited if they guarantee the the preconditions are true.  This approach
is really hard-core in eiffel, known as design by contract.

In NetBSD, many functions have KASSERT at the beginning.  This checks them
(under DIAGNOSTIC) but it also is a way of documenting the rules.

>From a quick glance at the code it seems obvious that it's not ok to
call these functions with a NULL bdv.

So if bdv is an argument and not allowed to be NULL, then early on in
that function, where you check/return, there should be

  KASSERT(bdv != NULL)

Not really on point, but as a caution there should be no behavior change
in any function under DIAGNOSTIC, if the code is bug free and
preconditions are met.  So "if something we can rely on isn't true,
panic" is fine, but many other things rae not.