Re: Yet another dirct config proposal for i2c busses
- | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: regression (crash) in sysmon/acpiacad
On Thu, 4 Feb 2010, Matthias Drochner wrote: Hi - my laptop (running -current) now crashes as soon as the battery reaches warning level. This is due to a jump to NULL in sysmon_envsys_events.c, line 891 - an indirect call to sme-sme_refresh. This is indeed 0 for acpiacad now, since the refresh method was removed recently. How is this intended to work? Ooops. The refresh routine was removed because we're now relying on the notify code to actually work. But unfortunately we forgot to set the flag to avoid the call-back. Either Jukka or I will get this fixed up shortly. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: regression (crash) in sysmon/acpiacad
On Thu, 4 Feb 2010, Paul Goyette wrote: On Thu, 4 Feb 2010, Matthias Drochner wrote: my laptop (running -current) now crashes as soon as the battery reaches warning level. This is due to a jump to NULL in sysmon_envsys_events.c, line 891 - an indirect call to sme-sme_refresh. This is indeed 0 for acpiacad now, since the refresh method was removed recently. How is this intended to work? Ooops. The refresh routine was removed because we're now relying on the notify code to actually work. But unfortunately we forgot to set the flag to avoid the call-back. Either Jukka or I will get this fixed up shortly. Actually, Jukka _did_ set the SME_DISABLE_REFRESH flag. Unfortunately, it was not being checked in one of the call-backs. I have just added the missing check. Please update to rev 1.78 of src/sys/dev/sysmon/sysmon_envsys_events.c and try again. Thanks for finding this, and please accept my apologies for the breakage. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: regression (crash) in sysmon/acpiacad
On Thu, 4 Feb 2010, Matthias Drochner wrote: p...@whooppee.com said: Please update to rev 1.78 of src/sys/dev/sysmon/sysmon_envsys_events.c and try again. Thanks -- I just managed to run low on battery with a new kernel. I can confirm that the problem is fixed, the box runs well until graceful shutdown by the sensor_battery script. But -- since my battery was low anyway I did another test, whether it behaves well if the userland powerd is not running. It does not. I'd expect it to hit the cpu_reboot() call in sysmon_penvsys_event() eventually, but it did not. It didn't even update the charge value visible in envstat(8) -- this stayed at 6.00% (this is where I killed powerd) even when the red battery LED on my (Dell) laptop started to light permanently, which happens below 3% (the low cap limit). This used to work, so there is another regression. OK, we'll look into it. Since the charge value was not updating, it might be that the ACPI Notify isn't working here. Perhaps we should not rely on this, and put the refresh() routine back... Since I don't have any battery-powered devices, I'll defer to Jukka to see what we can do. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: regression (crash) in sysmon/acpiacad
On Fri, 5 Feb 2010, Matthias Drochner wrote: p...@whooppee.com said: Regarding the earlier comment about your battery charge level not changing... Is this still true? Just checked again and found that the charge level is updated now, but only every 30s. (The code in acpibat_refresh() suggests that cached values are kept only for 5s, and I believe that is how it did behave formerly.) The 30-second timer is the default poll interval for sysmon_envsys, so we're calling the refresh routine on that basis. I'm not totally up to speed on the acpibat internal stuff, so I don't know if it should be getting updated more frequently. I definitely watched the battery state for a couple of minutes yesterday, so something else must have happened. Looking at acpibat_refresh()... the 5-seconds check uses microtime() which is not necessarily monotonic. And, iirc, I had booted Windows before, which sets the RTC to GMT+1, so I had to force the clock back by a manual ntpdate. This might have killed status updating. Well, just a hypothesis, but I'd suggest to use getmicrouptime() for these checks. Yeah, good point. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: regression (crash) in sysmon/acpiacad
On Sun, 7 Feb 2010, Jukka Ruohonen wrote: On Sun, Feb 07, 2010 at 02:24:37PM +0100, Joerg Sonnenberger wrote: The point is that they have specific meaning for the hardware and as I said, are generally not changeable. For example, on my laptop, the power LED turns orange when warning cap is reached and shuts down hard on critical. You can already alter the low and warning capacity, as documented in envsys.conf(5). I am not arguing that it would be a good idea, quite the contrary. You can't necessarily change the values that the hardware monitors, but you can set the limits that are used to control the sending of event notifications to powerd. (A sensor can also set the ENVSYS_MONNOTSUPP flag to disable changing those limits.) If sme_set_limits() is omitted but sme_get_limits() is introduced, the visible change would merely imply that from $ envstat -d acpibat0 Current CritMax WarnMax WarnMin CritMin Unit warn cap: 2.511 Wh ( 5.00%) low cap: 0.200 Wh ( 0.40%) charge:50.230 Wh (100.00%) we would move to $ envstat -d acpibat0 | grep cap Current CritMax WarnMax WarnMin CritMin Unit ... charge:50.230 2.5110.200 Wh (100.00%) At least currently, sensors with the ENVSYS_FPERCENT flag display all of their limit values as percentages. So the output would be more like charge:50.230 5.000% 0.400% Wh (100.00%) Similar thing is already done in acpitz(4), sdtemp(4), and ipmi(4). The argumentation in favor of the percentages goes in similar vein. As I understand it, these callbacks were introduced specifically for those sensors that are capable of checking the limits in hardware. Supposedly this should be the case here. Yes, the primary rationale for these callbacks is to provide a means for userland to obtain (and possibly modify) the values used by the hardware device to raise alarms. At the time I was writing the sdtemp(4) driver, it seemed silly to always compare current-vs-limit in software when the chip was more than happy to do that task for us, if we would only set a threshold value for it to use! - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: need help with kern/42758 - correctly initialize W83627HF hwmon
I'll have a look. On Sun, 7 Feb 2010, Michael Stapelberg wrote: Hi, Yesterday I debugged why the temperature values seen in envstat on my mainboard looked very odd (70 degC while the bios hardware monitor shows 40 degC). As I described in the bugreport, it is a missing initialization. I also attached a patch to the bugreport, which fixes the problem to me. However, I am not a NetBSD kernel developer, so the patch very likely needs some polishing. It would be great if someone could have a look at it so that it can go into the next NetBSD release. Best regards, Michael PS: kern/42759 is related to this and a rather simple patch - it needs to be merged before debugging kern/42758 (so that the i2c is accessible on this board). PS: Please CC me, as I am not on this list. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: need help with kern/42758 - correctly initialize W83627HF hwmon
On Sun, 7 Feb 2010, Michael Stapelberg wrote: Hi Paul, Thanks for your help so far. Excerpts from Paul Goyette's message of So Feb 07 22:06:23 +0100 2010: a) all W83627HF use temp sensors of type 3904, or b) the W83627HF on this board uses 3904. Any chance you can do some research in the datasheet to see if there is any mention of this? The datasheet itself does not mention any preference in regard to the type. In the wiki on lm-sensors.org I could find only one configuration which actually uses the temperature sensors of the W83627HF. This configuration also configures it to use the 3904 mode. Yes, I went through the datasheet, too, and there seems to be no way to tell what type of sensor is actually used. I think it would be OK to add a 'flags' to the config syntax, so you would say lm* at iic? addr 0xXX flags 1 to enable 2N3904 bipolar temp sensors. The default would be to keep current behavior which expects a normal thermistor. Now, I just need to figure out where the flags value gets passed into the driver! - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: need help with kern/42758 - correctly initialize W83627HF hwmon
On Sun, 7 Feb 2010, Paul Goyette wrote: In the wiki on lm-sensors.org I could find only one configuration which actually uses the temperature sensors of the W83627HF. This configuration also configures it to use the 3904 mode. Yes, I went through the datasheet, too, and there seems to be no way to tell what type of sensor is actually used. I think it would be OK to add a 'flags' to the config syntax, so you would say lm* at iic? addr 0xXX flags 1 to enable 2N3904 bipolar temp sensors. The default would be to keep current behavior which expects a normal thermistor. Now, I just need to figure out where the flags value gets passed into the driver! Can you try the attached diff, and set 'flags 1' in your config file? - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: nslm7x.c === RCS file: /cvsroot/src/sys/dev/ic/nslm7x.c,v retrieving revision 1.49 diff -u -p -r1.49 nslm7x.c --- nslm7x.c13 Oct 2008 12:44:46 - 1.49 +++ nslm7x.c7 Feb 2010 23:04:42 - @@ -1766,7 +1766,7 @@ static int wb_match(struct lm_softc *sc) { const char *model = NULL; - int banksel, vendid, devid; + int banksel, vendid, devid, regval; aprint_normal(\n); /* Read vendor ID */ @@ -1790,6 +1790,16 @@ wb_match(struct lm_softc *sc) case WB_CHIPID_W83627HF: model = W83627HF; lm_setup_sensors(sc, w83627hf_sensors); + + if (device_cfdata(sc-sc_dev)-cf_flags 1) { + /* Switch to 2N3904 mode for the temperature sensors */ + lm_generic_banksel(sc, WB_BANKSEL_B0); + (*sc-lm_writereg)(sc, WB_BANK0_RESVD1, 0x0); + regval = (*sc-lm_readreg)(sc, 0x5d); + regval |= 0xe; + (*sc-lm_writereg)(sc, WB_BANK0_VBAT, regval); + lm_generic_banksel(sc, banksel); + } break; case WB_CHIPID_W83627THF: model = W83627THF; Index: nslm7xvar.h === RCS file: /cvsroot/src/sys/dev/ic/nslm7xvar.h,v retrieving revision 1.26 diff -u -p -r1.26 nslm7xvar.h --- nslm7xvar.h 12 Oct 2008 13:17:28 - 1.26 +++ nslm7xvar.h 7 Feb 2010 23:04:42 - @@ -87,6 +87,7 @@ /* Bank 0 regs */ #define WB_BANK0_CHIPID0x58/* Chip ID */ +#define WB_BANK0_RESVD10x59/* Resvd, bits 6-4 select temp sensor mode */ #define WB_BANK0_FAN45 0x5c/* Fan 4/5 Divisor Control (W83791D only) */ #define WB_BANK0_VBAT 0x5d/* VBAT Monitor Control */ #define WB_BANK0_FAN4 0xba/* Fan 4 reading (W83791D only) */
Re: need help with kern/42758 - correctly initialize W83627HF hwmon
On Mon, 8 Feb 2010, Michael Stapelberg wrote: Hi Paul, Excerpts from Paul Goyette's message of Mo Feb 08 00:14:44 +0100 2010: Can you try the attached diff, and set 'flags 1' in your config file? The patch works fine. I would suggest to use flag 2, however, to be consistent with the linux driver. The Linux driver seems to do something a little bit strange... From http://oss.sgi.com/cgi-bin/cvsweb.cgi/linux-2.6-xfs/Documentation/hwmon/w83781d?annotate=1.1 The W83782D and W83783S temperature conversion machine understands about several kinds of temperature probes. You can program the so-called beta value in the sensor files. '1' is the PII/Celeron diode, '2' is the TN3904 transistor, and 3435 the default thermistor value. Other values are (not yet) supported. The datasheet for the sensor says Reg. 0x59 bits [654] Temperature sensor diode [321]. Set to 1, select Pentium II compatible Diode. Set to 0 to select 2N3904 Bipolar mode. Power-On default is 1. Reg. 0x5d bits [321] Temperature sensor [321]. Set to 1, select bipolar sensor. Set to 0, select thermistor sensor. Power-On default is 0. So, from the datasheet, there seem to be only two sensor types, with Pentium II Diode being the default, while the Linux docs claim three sensor types and a default of some non-Pentium thermistor. Rather confusing. For now, I'm inclined to do things according to the datasheet. :) Unless anyone objects, I'll commit my patch later today. (And I'll add comments on the flag value in all of the config files that contain the lm(4) device.) - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: wbsio(4): Winbond Super I/O driver for lm(4) over wbsio(4) attachment
On Wed, 17 Feb 2010, Constantine Aleksandrovich Murenin wrote: 1. Does it make sense to extend this to any other functions of the super-IO chip? Are any others likely to ever show up on non- standard ISA ports? Yes, wbsio(4) could later provide Watchdog Timer support, but such timer is accessed directly through the Super I/O ports, not through a separate set of ports as is the case with Hardware Monitor. OK. 2. In the man page, please change the statement +driver provides support for the Winbond LPC Super I/O ICs. to +driver provides configuration support for the Winbond LPC +Super I/O ICs. Interesting point, but perhaps this is being too specific? Specifically since it could later even support the watchdog timer? :-) OK. 3. As written, wbsio_print() assumes that the ir_size of the lm(4)'s isa_attach_args is the same as the size of the wbsio(4)'s ir_size. Is this really correct? Or should we explicitly set the ir_size member before calling config_found()? I don't see such assumption anywhere in wbsio(4). In any case, ir_size is always set in the match procedure of the respective matching driver, e.g. it is still set to 8 for lm(4) in lm_isa_match(): wbsio0 at isa0 port 0x2e-0x2f: Winbond LPC Super I/O W83627DHG rev 0x23 lm0 at wbsio0 port 0x290-0x297 lm0: Using default temp sensors lm0: Winbond W83627DHG Hardware monitor OK, I was not aware that each device's *_match() routine would set the ir_size. Thanks for clarifying. Curiosity: I will confess that I'm writing this section without datasheets in front of me... But I notice that there are some differences in the lists of devices between wbsio(4) and lm(4). A few IDs are the same, but most are different. I also notice that wbsio(4) uses the device-ID, while lm(4) uses the chip-ID. Question: 4. So, would it make sense to be consistent and use the same xxx-ID in both drivers? (In which case, wbsio(4) could simply include nslm7xvar.h rather than having its own ID definitions.) Your observation is correct- the namespace for IDs is different, so it would not make much sense to unify this with nslm7xvar.h in any way. OK Question: 5. And, does the list of devices in wbsio(4) deliberately exclude the rest of the lm(4) devices? Or could some/all of the remaining chips from lm(4) be added to wbsio(4)? I think all supported devices were actually already included, and since wbsio(4) was written, no new devices were introduced to lm(4). Obviously, not all of those supported by lm(4) could ever show up on the isa bus - for example, W83792D and a few others are SMBus-only. OK. You've addressed all my concerns. Let's give a couple days for others to comment. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Proposed changes to the dbcool(4) driver
Folks, When the dbcool(4) driver was commited, sysmon_envsys(9) did not have the ability to retrieve and set sensor limit values in the hardware device. As a result, dbcool had a separate user interface, using sysctl(8), for this purpose. Now that sysmon_envsys has grown the required capability (and it is actually working!), I'd like to remove the bulk of the sysctl interface from the dbcool driver, and provide that functionality using sysmon's capabilities. (I'll leave intact the ability to control the fan-speed PWM controllers, since there's still no viable alternative.) Does anyone see any reason to leave the sysctl interface in place? If we do leave it, can/should it be enabled/disabled using #define DBCOOL_SYSCTL ... #endif My preference would be to remove the sensor limit sysctl completely. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Exposing internal sysmon_envsys(8) structures to userland
It has always bothered me that src/sys/sys/envsys.h exposes several internal data structures to userland. It seems to me that the only items from this header file that should really be exported are * the definitions for the various IOCTLs * the definitions for the sensor units * the definitions for the sensor states/values * the definitions for the legacy struct envsys_tre_data The struct sysmon_envsys_lim and struct envsys_data should be moved into src/sys/dev/sysmon/sysmon_envsysvar.h along with the related internal flags and properties definitions. Comments? - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Unsafe GENERIC? - Re: (unknown)
On Tue, 16 Mar 2010, Aleksej Saushev wrote: Well... Could we arrange it so that we have safe monolithic GENERIC until issues are resolved somehow? For i386, use MONOLITHIC instad of GENERIC For amd64, the default is still MONOLITHIC, if I remember correctly. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Dead ports [Re: config(5) break down]
On Fri, 19 Mar 2010, Thor Lancelot Simon wrote: FYI -- and I doubt you could really buy just one of these at this price -- http://avnetexpress.avnet.com/store/em/EMController/Network-Processor/Cavium-Networks/CN5220-500BG729-SCP-Y-G/_/R-8960980/A-8960980/An-0?action=partcatalogId=500201langId=-1storeId=500201 I tried to get Avnet to provide a quote on an eval board for an Octeon and they never got back to me. (The eval board didn't have a Buy_Now button, only a Request_Quote button.) - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Soliciting opinions on desired envstat -S behavior
Currently, the -S option for envstat(8) is documented to ...restore defaults to all devices registered with the framework. This will remove all properties that were set in the configuration file to the setting that the drivers use by default. This was written this way long ago, before sysmon_envsys(9) learned how to import/export limit setting with hardware drivers. As a result, the current behavior is sub-optimal, as it removes _ALL_ sensor limits, regardless of whether those limits were learned from the hardware driver or provided by the configuration file. Even worse, the limits are removed from within the sysmon_envsys framework, but the hardware driver is not notified of the removal; thus the hardware may still be checking the limits, and with the recent introduction of non-polled event delivery, the driver could actually report a limits event for a sensor with no limits! I'm soliciting opinions on what the correct behavior should be, for each of the following sensor categories: A. Sensors that have no hardware knowledge of limit values, and rely only on the sysmon_envsys(9) framework. I believe that the current behavior is correct - all knowledge of limit values is discarded. B. Sensors that can provide initial limit values, but which cannnot handle changes to those values in hardware. All limit processing is actually handled in the sysmon_envsys(9) framework and not in the driver or hardware. The current behavior works, but I believe that we should really re-import the initial values from the driver/hardware. This is closest to the documented behavior. C. Sensors that can actually process limit checking in the hardware (or in the driver) and that can accept new limit values to replace any initial (or factory default) values. I see at least three options here. C1. In addition to current behavior (removing all limits from the sysmon_envsys(9) framework), we should update the driver/hardware to let it remove its own limits. or C2. Behave as proposed in (B) above, and simply reimport the current values from the hardware/driver after clearing the framework. or C3. Remember the initial values from when the sensor was first registered, and after clearing the framework, reset both the hardware/driver and the framework to the remembered values. I'd really appreciate comments/feedback/suggestions/etc. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Soliciting opinions on desired envstat -S behavior
On Sun, 21 Mar 2010, Jukka Ruohonen wrote: This part certainly gets complicated, and a C3 variation may sound like the right thing to do. I think it would be best if the drivers could reset the limits internally with an appropriate reset command issued directly to the underlying hardware as opposed to the bookkeeping of the values in the software (otherwise, what would happen after a soft reboot?), but I'm unaware regarding whether or not such hardware support is usually present. Additionally, the limit values can be in read-only registers, etc. If the registers are read-only, the driver should not provide a xxx_set_limits() callback. Furthermore, it should either prevent the user from changing the limits values (with the FMONNOTSUPP flag) or leave value/limit checking to the software framework. In either case, if the registers are read-only, the driver falls into category A or B. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Using proplist ioctl's from within the kernel
On Thu, 25 Mar 2010, Matthias Drochner wrote: jeanyves.mig...@free.fr said: There should be some way to serialize/deserialize prop/plistref objects within the kernel, but I never found a solution. I suppose you will have to dig deeper than I did :/ I think it should be avoided at all costs. With proplists, you give up type safety and you create a lot of overhead. This is acceptable for kernel-user interfaces in some cases but not within the kernel. So would it be appropriate for me to expose sysmon_envsys's global mutex and its device list in order for acpi_apm to scan for battery sensors and A/C adapters? - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Using proplist ioctl's from within the kernel
On Fri, 26 Mar 2010, haad wrote: I think it should be avoided at all costs. With proplists, you give up type safety and you create a lot of overhead. This is acceptable for kernel-user interfaces in some cases but not within the kernel. So would it be appropriate for me to expose sysmon_envsys's global mutex and its device list in order for acpi_apm to scan for battery sensors and A/C adapters? This is wrong and should be avoided IMHO. Well, as it turns out, both the global mutex and the device list are already exposed, so it is a rather trivial task for acpi_apm.c to scan the device list. - | Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
xxxVERBOSE module?
From the comments in the GENERIC config files, the primary reason for omitting the various xxxVERBOSE options is to avoid including large text tables in the resulting kernel. And I vaguely recall some spirited discussion back when the change was made to exclude these options by default. Now that we have MODULAR kernels (at least on some architectures), I've been wondering if it might make sense to create a mod_verbose that could be loaded during start-up time and then unloaded after the machine is up and running. (For plug-and-play situations, such as USB, the module could be reloaded and unloaded whenever a new device is added.) Is this something that would be useful? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: xxxVERBOSE module?
On Thu, 20 May 2010, Manuel Bouyer wrote: On Thu, May 20, 2010 at 08:13:15AM +0200, Marc Balmer wrote: To my understanding most of the VERBOSE options are compile-time switches which are used to conditionally compile debug code using #ifdefs. I am not sure how such a module should work then, changing #ifdef'ed code into code that is always compiled, but only used under certain circumstances? No, this was about PCIVERBOSE, USBVERBOSE and SCSIVERBOSE, which adds in the kernel table of strings for vendor/device names, extended error codes, etc ... Correct - that was my intention. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: xxxVERBOSE module?
On Thu, 20 May 2010, David Young wrote: On Wed, May 19, 2010 at 08:52:52PM -0700, Paul Goyette wrote: From the comments in the GENERIC config files, the primary reason for omitting the various xxxVERBOSE options is to avoid including large text tables in the resulting kernel. And I vaguely recall some spirited discussion back when the change was made to exclude these options by default. Now that we have MODULAR kernels (at least on some architectures), I've been wondering if it might make sense to create a mod_verbose that could be loaded during start-up time and then unloaded after the machine is up and running. (For plug-and-play situations, such as USB, the module could be reloaded and unloaded whenever a new device is added.) Is this something that would be useful? Yes, it would be. ... Kewl - I plan on starting with PCIVERBOSE, followed closely with USB. ... Just an idea: we may get a lot of mileage out of a generic method for adding or augmenting, and removing kernel property lists from a kernel, either at run- or compile-time. Consider that much of the VERBOSE info is in tabular form, thus expressible by property list. So are PCI/USB/... vendor/product tables that we use to match drivers with devices---I wish we could change those tables without recompiling. I anticipate using property lists to express device locators. So it seems to me that linking property lists into the kernel could be very useful. Great idea, but I think I'll save that for a future time when I have a bit more free time! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Kernel modules - documentation?
Is there any documentation on the modules interface or API? There does not seem to be anything in the man pages... My specific questions: What actually triggers an autoload of a module? (There seem to be very few places where module_autoload() is called.) What is the semantic difference between module_autoload() and a normal module_load()? Does the code which calls either of these routines need to be concerned with whether the module has been previously loaded? Is it OK to load a module that has already been loaded? Given that there is a kernel thread that runs around and attempts to unload any unreferenced modules that have been loaded for a while, is it ever necessary or desirable to explicitly unload a module? What happens if a global symbol referenced by a module doesn't exist? Does the module get loaded anyway, leaving the reference unresolved? Thanks in advance! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Kernel modules - documentation?
On Sat, 22 May 2010, Adam Hamsik wrote: My specific questions: What actually triggers an autoload of a module? (There seem to be very few places where module_autoload() is called.) Device module_autoloading is done in specfs_open. OK. So I guess there's no generic Gee, the kernel just tried to reference something that's not here, so maybe we can find a module to resolve that reference. What is the semantic difference between module_autoload() and a normal module_load()? module_autoload happens automatically when you try to open device node which doesn't have device driver in kernel. My question was more along the lines of what is the _difference_ between module_load() vs module_autoload()? Or even simpler, Why do both routines exist? :) Does the code which calls either of these routines need to be concerned with whether the module has been previously loaded? Is it OK to load a module that has already been loaded? I think that this is not possible and these routines return an error when you try to do that. That's OK, I'd expect the error. I can ignore that. I just need to be sure that the previously-loaded module doesn't get screwed up from the attempt to load it the second time. Given that there is a kernel thread that runs around and attempts to unload any unreferenced modules that have been loaded for a while, is it ever necessary or desirable to explicitly unload a module? It doesn't work this way there is a thread(workqueue) which try to unload module in first 300 seconds or mili seconds I can't remember now. But this thread or whatever is it doesn't unload modules older that set limit AFAIK. Hmmm, I must have misunderstood this code - time to go look again. What happens if a global symbol referenced by a module doesn't exist? Does the module get loaded anyway, leaving the reference unresolved? If you have a module which uses foo routine and it is not defined in kernel or loaded modules your module will simply not load. This is actually what I was hoping would happen. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: xxxVERBOSE module?
On Fri, 21 May 2010, John Nemeth wrote: } be loaded during start-up time and then unloaded after the machine is up It would have to be loaded by the boot loader then. As far as I know, only the x86 boot loader is capable of loading modules. I'm almost finished with the PCIVERBOSE stuff... My current approach is to load the module right before the first pcibus is enumerated, and unload when finished. So we can use the in-kernel loader/linker for whichever platforms it supports. For other platforms it will still be possible to set 'options PCIVERBOSE' to generate a built-in module. The fun part is making sure that the shared code still plays nicely with src/lib/libpci :) Yes, I think it would be very useful. I was thinking of looking at this myself. Thanks for taking on this task. It's one thing I can cross off my TODO list. :-) No worries! When I finish up with PCI, I'll start in on USB. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: xxxVERBOSE module?
On Fri, 21 May 2010, John Nemeth wrote: } My current approach is to load the module right before the first pcibus } is enumerated, and unload when finished. So we can use the in-kernel File systems aren't initialised during autoconf when the system is being cold booted, thus it isn't possible for the kernel to load a module at that point in time. Also consider that on most platforms with PCI, prior to the first pcibus being enumerated, the kernel doesn't know anything about any disk drives that may be attached to the system. Ah, yeah! Ooppss! :) I guess I can probably remove those changes. So we'll have to rely on either the module being built-in or established by the boot loader. } The fun part is making sure that the shared code still plays nicely with } src/lib/libpci :) I don't know anything about libpci, but have fun with that. The only thing I find that uses libpci is /usr/sbin/pcictl and that seems to be working fine. } No worries! When I finish up with PCI, I'll start in on USB. I imagine that will be mostly copy/paste. Mostly. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: xxxVERBOSE module?
On Sat, 22 May 2010, John Nemeth wrote: } (referring to loading the pciverbose module...) } I guess I can probably remove those changes. So we'll have to rely on Although, you can't test them, possibly keep them. We do want to support hotswap PCI at some point. It turns out that I need to keep them after all! Without an explicit load, the modules don't get initalized until much later in system startup, and this module beomces rather useless. :) There is still one problem, though. As you (John) pointed out earlier, we don't have any accessible filesystems from which to load modules. But unfortunately, the kern_module subsystem doesn't seem to realize this. So, if the pci code requests the module to be loaded, and the module is neither built-in nor provided by the boot loader, it _will_ attempt to load from the filesystem. I expected it to fail, but the failure mode is rather ungraceful - the kernel panics with backtrace _atomic_inc_32 namei + x109 vn_open + x84 kobj_load_vfs + x82 module_load_vfs + x2cf module_do_load + 3e9 module_load + x68 It would seem to me that somewhere in this mess we should simply return ENOENT (or some other appropriate errno). Also, dyoung@ is in the process of merging cardbus into PCI. I don't know if your changes would cause mod_verbose to be autoloaded for cardbus, which uses PCIVERBOSE, but cardbus insertion should be handled at some point. And, of course, there is PCIe and expresscard on the horizon. expresscard is basically hotswap PCIe along with USB (i.e. an expresscard slot brings out signals for both and the card determines which one it is going to use). Both of these will be reasonable. I've already updated all the cardbus references to the pci_find{vendor,product} routines, so it would be a simple matter of making the pciverbose module gets (re)loaded. What about X? What does it use for scanning the PCI bus? I suspect it uses libpci - I'll be doing a full-tree search for libpci shortly. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Enabling built-in modules earlier in init
On Wed, 16 Jun 2010, Antti Kantee wrote: On Wed Jun 16 2010 at 04:13:54 -0700, Paul Goyette wrote: With the current ways of secmodel register, I'd be damn careful to not push it around. The effect is that if it's called 0 times, you have a system which allows everything. So if your suggestion is implemented and you're testing a new secmodel which buggily omits register alongside another correctly registering secmodel, things will appear to work fine, But if in some scenario the buggy one is loaded alone, well ... welcome to the wishing well. I had some concern about this as well, wondering if I would be able to be sure I'd found all the secmodel modules that might exist. Especially ones which aren't in src! Perhaps it would be best to retain MODULE_CLASS_SECMODEL and also add the suggested MODULE_CLASS_EARLY? That would be my vote. But, early is a little vague. What if in the future we want modules which are initialized even earlier. Will those be called MODULE_CLASS_EARLIER_THAN_EARLY? If the class means intialized before autoconf, why not use that in the name? Would MODULE_CLASS_AUTOCONF work? Also, the modclass id is exported to userland and used as an index to a table in modstat. I think I filed a PR about this being suboptimal. Yeah, I was planning to update modstat(8) as well. The better choice is to update modctl(2) to pass down the information as a proplist. That way even module classes are pluggable and other information is easy to add if necessary. I'm secretly hoping someone will do this before 6.0 ... ;) I might do it, but on a second pass. :) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Enabling built-in modules earlier in init
On Wed, 16 Jun 2010, Antti Kantee wrote: I have to admit I haven't been following your work too closely, but builtin modules are initialized either when all of them are initialized per class or when their initialization is explicitly requested. So if whatever uses PCIVERBOSE requests the load of the PCIVERBOSE module, it should be initialized and you should be fine (see module_do_load()). The only but is that explicit loads must be accompanied by MODCTL_LOAD_FORCE. I wrote it that way because of the security use case: if you disable a builtin module due to a security hole, you don't want it to get autoloaded later. For file system modules you can always use rm, but for builtins you don't have that luxury. So if that is actually what you're chocking on, I suggest adding some flag to determine if the module has ever been loaded and ignore the need for -F if it hasn't. The attached diffs add a new mod_disabled member to the module_t structure, and set the value to false in each place that a new entry is created. (Since all of the allocations of module_t structures are done with kmem_zalloc() I could probably avoid the explicit setting of the value to false.) The value is set to true whenever a module is removed from active duty and returned to the module_builtin list. (I specifically did NOT mark a module disabled if its modcmd(INIT) failed, under the assumption that it might succeed in a later retry.) Loading of an entry from the module_builtin list no longer requires the -f flag if the entry has not been disabled. This approach avoids the need for defining new a new module class, and it avoids needing to mess around with the current secmodel_register() stuff. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: sys/module.h === RCS file: /cvsroot/src/sys/sys/module.h,v retrieving revision 1.23 diff -u -p -r1.23 module.h --- sys/module.h24 May 2010 03:50:25 - 1.23 +++ sys/module.h16 Jun 2010 12:42:42 - @@ -90,6 +90,7 @@ typedef struct module { time_t mod_autotime; void*mod_ctf; u_int mod_fbtentries; /* DTrace FBT entrie count */ + boolmod_disabled; } module_t; /* Index: kern/kern_module.c === RCS file: /cvsroot/src/sys/kern/kern_module.c,v retrieving revision 1.69 diff -u -p -r1.69 kern_module.c --- kern/kern_module.c 26 May 2010 23:53:21 - 1.69 +++ kern/kern_module.c 16 Jun 2010 12:42:42 - @@ -195,6 +195,7 @@ module_builtin_add(modinfo_t *const *mip modp[i] = kmem_zalloc(sizeof(*modp[i]), KM_SLEEP); modp[i]-mod_info = mip[i+mipskip]; modp[i]-mod_source = MODULE_SOURCE_KERNEL; + modp[i]-mod_disabled = false; } mutex_enter(module_lock); @@ -416,6 +417,7 @@ module_init_class(modclass_t class) * init. */ if (module_do_builtin(mi-mi_name, NULL) != 0) { + mod-mod_disabled = true; TAILQ_REMOVE(module_builtins, mod, mod_chain); TAILQ_INSERT_TAIL(bi_fail, mod, mod_chain); } @@ -794,7 +796,7 @@ module_do_load(const char *name, bool is } } if (mod) { - if ((flags MODCTL_LOAD_FORCE) == 0) { + if (mod-mod_disabled (flags MODCTL_LOAD_FORCE) == 0) { if (!autoload) { module_error(use -f to reinstate builtin module \%s\, name); @@ -854,6 +856,7 @@ module_do_load(const char *name, bool is return error; } mod-mod_source = MODULE_SOURCE_FILESYS; + mod-mod_disabled = false; TAILQ_INSERT_TAIL(pending, mod, mod_chain); error = module_fetch_info(mod); @@ -1086,6 +1089,7 @@ module_do_unload(const char *name) } if (mod-mod_source == MODULE_SOURCE_KERNEL) { mod-mod_nrequired = 0; /* will be re-parsed */ + mod-mod_disabled = true; TAILQ_INSERT_TAIL(module_builtins, mod, mod_chain); module_builtinlist++; } else { @@ -1129,6 +1133,7 @@ module_prime(void *base, size_t size) } TAILQ_INSERT_TAIL(module_bootlist, mod
Re: Enabling built-in modules earlier in init
On Thu, 17 Jun 2010, Antti Kantee wrote: On Wed Jun 16 2010 at 15:36:30 -0700, Paul Goyette wrote: The attached diffs add one more routine, module_init3() which gets called from init_main() right after module_class_init(MODULE_CLASS_ANY). module_init3() walks the list of builtin modules that have not already been init'd and marks them disabled. Tested briefly on my home systems and appears to work. Any objections to committing this? I'd still hook it to the end of module_class_init(MODULE_CLASS_ANY) instead of adding more randomly numbered module_initn() calls. The other benefit from doing so is that you get it done atomically, which is always worthwhile, and doubly so when it's a low hanging fruit like here. I dislike adding another special-case-overload. In my mind, having module_class_init(MODULE_CLASS_ANY) have the additional effect of disabling un-init'd modules is not much different from automatically registering modules in MODULE_CLASS_SECMODEL. This thread has already noted that such registration belongs in each module's modcmd(). @@ -416,6 +434,7 @@ module_init_class(modclass_t class) * init. */ if (module_do_builtin(mi-mi_name, NULL) != 0) { + mod-mod_disabled = true; TAILQ_REMOVE(module_builtins, mod, mod_chain); TAILQ_INSERT_TAIL(bi_fail, mod, mod_chain); } Why do you mark it as disabled? Doesn't this conflict with the it might succeed in a later module_init_class() idea you presented earlier? Yes, it does conflict. I've removed this line, and updated the comment block above. module_disabled = true/false in multiple places looks a little error-prone. Now that struct module is growing more and more members, maybe we can just have an object allocator which initializes the value and afterwards the only acceptable mutation for module_disabled is setting it to true (might make sense to rename the variable to something like module_virgin and flip the polarity, though). HeHe - module_virgin would seem to be a bit obscure to me if I hadn't written the code. Perhaps module_autoload_ok would be acceptable (with a state flip)? Or module_require_force (with no flip)? The attached diffs use module_autoload_ok instead of module_disabled (and state flip), provide an object allocator, and a single mutation function. It also avoids disabling the module if it is unloaded by the auto-unload thread; only modules that are explicitly unloaded by name should be prevented from future autoloads. I've renamed both module_init2() and module_init3() to more descriptive routine names, but for now I've kept _init3() as a separate function and not made it part of module_init_class(). I'll be doing some testing of this over the next day or two before committing. As always, review, comments, and suggestions (and yes, even criticisms!) are welcomed and solicited. :) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: sys/sys/module.h === RCS file: /cvsroot/src/sys/sys/module.h,v retrieving revision 1.23 diff -u -p -r1.23 module.h --- sys/sys/module.h24 May 2010 03:50:25 - 1.23 +++ sys/sys/module.h17 Jun 2010 12:53:24 - @@ -90,6 +90,7 @@ typedef struct module { time_t mod_autotime; void*mod_ctf; u_int mod_fbtentries; /* DTrace FBT entrie count */ + boolmod_autoload_ok; } module_t; /* @@ -120,7 +121,8 @@ extern struct modlist module_builtins; extern u_int module_gen; void module_init(void); -void module_init2(void); +void module_start_unload_thread(void); +void module_builtin_no_autoload(void); void module_init_md(void); void module_init_class(modclass_t); intmodule_prime(void *, size_t); Index: sys/kern/init_main.c === RCS file: /cvsroot/src/sys/kern/init_main.c,v retrieving revision 1.420 diff -u -p -r1.420 init_main.c --- sys/kern/init_main.c10 Jun 2010 20:54:53 - 1.420 +++ sys/kern/init_main.c17 Jun 2010 12:53:24 - @@ -430,7 +430,7 @@ main(void) loginit(); /* Second part of module system initialization. */ - module_init2(); + module_start_unload_thread(); /* Initialize the file systems. */ #ifdef NVNODE_IMPLICIT @@ -594,9
Preserving early console output (pre-Copyright stuff)
I'm trying to debug some stuff on a new server, and there are some messages of interest that get printf'd on the console very early in the system's life, before the Copyright (c)... lines. There is a large amount of info being printed, way more than a screenful and way more than can be captured with an eyeball snapshot. Unfortunately, it seems that this information is NOT retained in the kernel's message buffer, so it never gets collected by syslog. (For those who want to know details, some of the messages in question are generated by add_mem_cluster() in src/sys/arch/x86/x86/x86_machdep.c) So, is there either a) a way to capture these messages in the normal message buffer for later collection by syslog, or b) a way to pause long enough to manually transcribe the output? (A simple timed delay would work, although a Press any key to continue would be easier!) Thanks in advance! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Preserving early console output (pre-Copyright stuff)
On Thu, 1 Jul 2010, Allen Briggs wrote: Heh. You do need a serial cable and another computer, but it's not too hard to set up. You don't need to find any options in the BIOS if you install the right boot blocks. Yeah, OK, I got a cable, plugged it in, run cu(1) on one end, and added a new entry to the /boot.cfg on the other end: menu=Serial/Single/Verbose/Debug:consdev com0; boot -svx Then just boot, and things magically work! Thanks for the clue-by-four - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Add NULL power handler to x86/ipmi(4)
Is there any reason anyone can think of to not add a NULL power handler to the ipmi(4) driver? I can't see any reason for anything special to happen either at suspend or resume, and the lack of a power handler prevents the system from going to sleep at all. Proposed patch is attached. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: ipmi.c === RCS file: /cvsroot/src/sys/arch/x86/x86/ipmi.c,v retrieving revision 1.46 diff -u -p -r1.46 ipmi.c --- ipmi.c 10 Apr 2010 19:02:39 - 1.46 +++ ipmi.c 16 Jul 2010 12:19:09 - @@ -1934,6 +1934,10 @@ ipmi_thread(void *cookie) sc-sc_wdog.smw_tickle = ipmi_watchdog_tickle; sysmon_wdog_register(sc-sc_wdog); + /* Set up a NULL power handler so we can possibly sleep */ + if (!pmf_device_register(self, NULL, NULL)) +aprint_error_dev(self, couldn't establish a power handler\n); + mutex_enter(sc-sc_poll_mtx); while (sc-sc_thread_running) { ipmi_refresh_sensors(sc);
Re: Add NULL power handler to x86/ipmi(4)
On Sat, 17 Jul 2010, David Young wrote: ipmi(4) should probably not suspend if its watchdog timer is active. Would it be sufficient for ipmi(4) to refuse to suspend (return false from the suspend method) if the watchdog is active? Yes. I think that's the right thing to do for now. This is actually fairly easy to do. Or should it make some sort of effort to disable the watchdog, and attempt to reenable the watchdog during the subsequent resume? That defeats the purpose of a watchdog timer, at least for my purposes. This is a local policy choice that should probably not be hard-coded. Maybe we need additional watchdog modes? 1 Watchdog countdown stops during suspension, restarts after Probably a bit more difficult than #3 below, and it assumes that the remaining time is available. 2 Watchdog keeps counting down during suspend (we'll wake periodically to tickle it) This might be rather tricky. Depending on the type of suspend, the time for a wake-up to occur could be faily long and/or variable. And there might not even be a way for the system to schedule its own wake event. 3 Watchdog prevents suspension As I indicated, this is fairly easy to do, simply check if the w-dog is armed or not. The attached patch provides an ipmi_suspend() method to implement #3. But I suspect that this is really a more generic watchdog capability, and it should be implemented for the entire class of watchdogs rather than only for ipmi(4)? Comments? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: ipmi.c === RCS file: /cvsroot/src/sys/arch/x86/x86/ipmi.c,v retrieving revision 1.46 diff -u -p -r1.46 ipmi.c --- ipmi.c 10 Apr 2010 19:02:39 - 1.46 +++ ipmi.c 17 Jul 2010 02:40:27 - @@ -225,6 +225,8 @@ int ipmi_sensor_status(struct ipmi_softc int add_child_sensors(struct ipmi_softc *, uint8_t *, int, int, int, int, int, int, const char *); +bool ipmi_suspend(device_t, const pmf_qual_t *); + struct ipmi_if kcs_if = { KCS, IPMI_IF_KCS_NREGS, @@ -1934,6 +1936,10 @@ ipmi_thread(void *cookie) sc-sc_wdog.smw_tickle = ipmi_watchdog_tickle; sysmon_wdog_register(sc-sc_wdog); + /* Set up a power handler so we can possibly sleep */ + if (!pmf_device_register(self, ipmi_suspend, NULL)) +aprint_error_dev(self, couldn't establish a power handler\n); + mutex_enter(sc-sc_poll_mtx); while (sc-sc_thread_running) { ipmi_refresh_sensors(sc); @@ -2093,3 +2099,14 @@ ipmi_dotickle(struct ipmi_softc *sc) device_xname(sc-sc_dev), rc); } } + +bool +ipmi_suspend(device_t dev, const pmf_qual_t *qual) +{ + struct ipmi_softc *sc = device_private(dev); + + /* Don't allow suspend if watchdog is armed */ + if ((sc-sc_wdog.smw_mode WDOG_MODE_MASK) != WDOG_MODE_DISARMED) + return false; + return true; +}
Re: Add NULL power handler to x86/ipmi(4)
On Sat, 17 Jul 2010, David Young wrote: It is a generic capability. ... That's the conclusion I came to. ... ISTM watchdog timers should eventually be refactored in this way: each watchdog timer in the system should have a corresponding pseudo-device, an instance of wdog(4). wdog(4) provides its own suspension/resumption/detachment routines. Let wdog_suspend() return false if the watchdog is active (for now). ipmi0 should attach wdog0 using a wdogbus interface attribute. Let struct sysmon_wdog become the watchdog attachment arguments, wdog_attach_args. That's probably a bit more work than I want to commit to right now. I was hoping there would be an easier way, since the sysmon pseudo-device already exists (with exactly one device-minor for wdogs). But I could not find anywhere a device_t, so no way to register a power handler. A also looked at just creating a new pmf device class for watchdogs, since most watchdogs are actually associated with some other hardware device (and therefore there's a device_t to use when registering the handler). But there's pseudo-device swwdog(4) that has no hardware behind it, so I ended up back with the same problem! Your patch looks fine. Thanks. I will commit this for now, with a big XXX in the commit log. I've also got a patch for itesio(4). Currently, itesio registers a NULL suspend handler, whether or not the particular device includes a wdog. My patch will register a real suspend handler if the wdog is present. I'll add watchdog-factoring to my To-Do list, but for now I'll just commit the changes for itesio(4) and ipmi(4). - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: power management and pseudo-devices
On Mon, 19 Jul 2010, Antti Kantee wrote: Include ioconf.h I think. Tried that. It works for compiling the kernel. Unfortunately, swwdog is included in rump, and there doesn't seem to be an ioconf.h available for the librump build. Well, whatever. I don't think I want to look at that. :) Actually, I still think this is wrong. It might make librump compile and even link, that doesn't mean it will be usable if nothing ever creates that extern symbol. You'll have to check with pooka or explore the code, but there have to be some components of rump that have partial configuration files. After all, he created the ioconf directive for that purpose. First of all, let's take a step up from the trenches and try to understand the problem we're dealing with instead of trying to arbitrarily guess how to fix the build. We have two ways of building kernel code: monolithic (./build.sh kernel=CONF) and modular (kernel modules, rump components). The latter ones currently always do config stuff on-demand, so changes cause breakage. Should a swwdog kernel module exist (why isn't there one?), it would run into the same problem. Now, the ioconf keyword for config(1) is meant to help build modular kernel code by allowing to specify a partial config file. Currently, as the name implies, it takes care of creating ioconf.[hc], namely in this case struct cfdriver swwdog_cd. Hmmm, I don't seem to see any mention of the ioconf keyword in the current config(1) or config(5) man pages. Adding a SYSMON.ioconf will solve the problem: === snip === ioconf sysmon include conf/files pseudo-device swwdog === snip === The good news is that if some day a sysmon kernel module is added, the exact same ioconf can be used without having to once again run into trouble. Yay - it appears to work! Revised diffs are attached to this E-mail. Then let's view the broader scale. I think acpibat is currently the only kernel module using ioconf, and I haven't bothered converting others since I have realized that the scope of ioconf was a little too narrow. I'm planning to change the keyword to module and add support for source files. This way the default build for every modular kernel component goes through config, and we can avoid issues due to config changes. IIRC I have the config(1) part for this done already, but being able to use an autogenerated SRCS list requires some bsd.subdir.mk style advanced make hackery which I haven't been in the mood for. Plus, there's of course the mess with file some.c foo | bar (baz ^ xyzzy) !!!frobnitz. Finally, on the eternal someone should astral plane, someone should fix the kernel build to consist of building a set of modules and linking them together, so we don't have more than one way to skin a kernel. I'm pretty sure someone != me :) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: sys/rump/dev/lib/libsysmon/SYSMON.ioconf === RCS file: sys/rump/dev/lib/libsysmon/SYSMON.ioconf diff -N sys/rump/dev/lib/libsysmon/SYSMON.ioconf --- /dev/null 1 Jan 1970 00:00:00 - +++ sys/rump/dev/lib/libsysmon/SYSMON.ioconf19 Jul 2010 12:50:54 - @@ -0,0 +1,8 @@ +# $NetBSD$ +# + +ioconf swwdog + +include conf/files + +pseudo-device swwdog Index: sys/rump/dev/lib/libsysmon/Makefile === RCS file: /cvsroot/src/sys/rump/dev/lib/libsysmon/Makefile,v retrieving revision 1.2 diff -u -p -r1.2 Makefile --- sys/rump/dev/lib/libsysmon/Makefile 16 Feb 2010 20:42:45 - 1.2 +++ sys/rump/dev/lib/libsysmon/Makefile 19 Jul 2010 12:50:54 - @@ -4,6 +4,7 @@ .PATH: ${.CURDIR}/../../../../dev/sysmon LIB= rumpdev_sysmon +IOCONF=SYSMON.ioconf SRCS= sysmon_taskq.c sysmon_power.c sysmon_envsys.c sysmon_envsys_events.c \ sysmon_envsys_tables.c sysmon_envsys_util.c sysmon_wdog.c sysmon.c \ Index: sys/dev/sysmon/files.sysmon === RCS file: /cvsroot/src/sys/dev/sysmon/files.sysmon,v retrieving revision 1.11 diff -u -p -r1.11 files.sysmon --- sys/dev/sysmon/files.sysmon 30 Jan 2010 21:55:28 - 1.11 +++ sys/dev/sysmon/files.sysmon 19 Jul 2010 12:50:54 - @@ -18,5 +18,5 @@ file dev/sysmon/sysmon_wdog.csysmon_wdo file dev/sysmon/sysmon.c sysmon_envsys | sysmon_wdog | sysmon_power -defpseudo swwdog: sysmon_wdog +defpseudodev swwdog: sysmon_wdog filedev/sysmon/swwdog.cswwdog Index: sys/dev
Modules loading modules?
Consider the following scenario: We have a modular device driver, let's call it xxxmod. The driver's xxx_modcmd(MODULE_CMD_INIT, ...) routine handles all of its initialization, including interaction with autoconf(9). It therefore might attempt to use an optional module (e.g., zzzverbose) to print some device attachment messages. (Another possible scenario would be to have one modular driver xxxmod that calls config_found_xxx() which in turn requires another modular driver yyymod.) We have no problem if the zzzverbose module is already auto-loaded. But if the module is not present, or if it has already been auto- unloaded, the current code will call module_autoload() to attempt to (re)load it. Since this call happens in xxx_modcmd(), we end calling mutex_enter() while the mutex is still owned. There is currently a requires mechanism that allows recursion within the module loading process. But it doesn't quite solve the problem, for at least three reasons. First, a required module cannot be optional. If the desired module is not present, or if it is present but its own xxx_modcmd(MODULE_CMD_INIT, ...) fails, the failure is propagated back to the original outer call to module_load() which will also fail. The second reason why this is not suitable is that the outer load will add a reference to the module, preventing it from being auto-unloaded. (A key benefit of having the xxx_verbose modules is that their large text tables can be unloaded.) A third reason that tends to make this option unuseable is the need to identify ahead of time all of the possible optional modules and maintain the dependency list. Adding a new driver would thus require updating the dependency list in the parent device's driver. Any suggestions on how to resolve this conundrum? Without a solution, the zzz_verbose modules and modularized drivers that might use the zzz_verbose modules cannot co-exist, and we won't be able to have modular drivers for devices whose children are served by other modular drivers. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: envsys issues [was Re: envstat wrong: who's at fault?]
On Mon, 26 Jul 2010, der Mouse wrote: [moved from port-i386] [me] In passing, would it be appropriate and/or useful to suggest improvements to [the envsys(4)] API? When I was writing code, I found the envsys(4) ioctls to be deficient for my purposes. [Paul Goyette] I'd be interested in knowing in which ways you found the current API lacking. [...] Well, some things are underdocumented. For example, it seems that most sensors are not in the units specified by envsys_tre_data_t.units, but rather in 1e-6 of that unit - for example ENVSYS_SWATTHOUR sensors are not in watthours but in microwatthours - and, as far as I've been able to find, this factor of 1e6 is not documented anywhere but the source to envstat(8). Except for temperature, which is specifically said to be in microKelvins. But others are said to be in volts, amps, etc. To quote both the envsys(4) and sys/envsys.h, union { /* all data is given */ uint32_t data_us; /* in microKelvins, */ int32_t data_s; /* rpms, volts, amps, */ } cur, min, max, avg; /* ohms, watts, etc */ /* see units below */ This is greatly cleaned up in the envsysV2 implementation. If the micro is supposed to apply to all those units, not just Kelvins, then (a) rpms needs to be removed from the list and (b) the wording needs to be improved. But the one piece I've found so far that isn't just a lack of documentation (well, unless there are totally undocumented calls) is that there's no way to fetch multiple sensors' values without potential for skew between them.. For example, to quote from the code I've been developing, In envsysV2, a single call is used to retrieve the whole set of sensors (in a prop dictionary) so you don't need to worry about skew caused by userland sampling. Of course, there is still the possibility of skew caused by the driver's need to fiddle with hardware registers. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Mon, 26 Jul 2010, der Mouse wrote: We have a modular device driver, let's call it xxxmod. [...] It [] might attempt to use an optional module (e.g., zzzverbose) to print some device attachment messages. First, a required module cannot be optional. If the desired module is not present, or if it is present but its own xxx_modcmd(MODULE_CMD_INIT, ...) fails, the failure is propagated back to the original outer call to module_load() which will also fail. The second reason why this is not suitable is that the outer load will add a reference to the module, preventing it from being auto-unloaded. Surely the right answer here is to provide a way to say refer to this module, but it's ok for its load to fail, and it's ok for it to get auto-unloaded, including passing up whatever information is necessary for the calling module to do something useful in failure cases? I actually considered adding a set of desired modules as well as the current required modules. That would certainly take care of the immediate problem with the xxx_verbose modules. But it doesn't address the need to have one module/driver loading another one for a child device, and it would be a maintenance nightmare to require updating the dependency-list every time a new child driver gets created. It really seems to me that the module system is there to help us, not to shackle us, and that if it has properties which are leading to problems, one of the options we should at the very least be considering is changing those properties. Yes, the current restriction appears to be burdensome. I'm just trying to get some consensus on that - so far I've seen only one negative opinion (from pooka@ IIRC). If we can agree that it's desirable for one module's xxx_modcmd() to load another module, then we can move to step 2 and figure out how to make it happen without breaking anything. If we don't get consensus, then I'll most likely need to revert the xxx_verbose modules. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Wed, 28 Jul 2010, Andrew Doran wrote: If modload-from-modcmd is found necessary, sounds more like a case for the infamous recursive lock. Recursive lock is the way to go. I think the same lock should also cover all device configuration activites (i.e. autoconf) and any other heavy lifting where we have chunks of the system coming and going. OK, I'm willing to make an attempt to implement the recursive_lock if noone else wants to give it a try! It will probably take some time... - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Wed, 28 Jul 2010, Andrew Doran wrote: it seems to me the root problem is that module_mutex is held while calling into the module startup routines. thus, the right solution is to remove this requirement. Yes, that's what is needed. I'm far from convinced that's a good idea. First, it will probably make the module code a nightmare -- what happens when you have multiple interleaved loads, some of which fail at some point in their dependency stack, and let's just throw in a manual modunload to mix up things further. Second, and pretty much related to number one, it goes against one of the most fundamental principles of robust code: atomic actions. This: atomicity. The atomic behaviour is relied upon to give all or nothing semantics upon load and unload, like transactions in a database. Admittely some modules (not of my making) are sloppy about this and so break that bit of the interface contract. If modload-from-modcmd is found necessary, sounds more like a case for the infamous recursive lock. Recursive lock is the way to go. I think the same lock should also cover all device configuration activites (i.e. autoconf) and any other heavy lifting where we have chunks of the system coming and going. Well, folks, here is a first pass recursive locks! The attached diffs are against -current as of a few minutes ago. Some caveats: 1. This has only been compile-tested for x86 (amd64 i386). 2. Other architectures have not even been compiled yet. It should work but you never know. 3. HPPA is an exceptional exception here, since it is the only in-tree architecture I found that cannot use SIMPLE_LOCKS. 4. There is only one known use case for this so far: modules loading other modules from within their xxx_modcmd() routine. The specific use case we have involves loading the acpicpu driver/module which eventually results in an attempt to load acpiverbose. It would be really nice if the community could A. Compile-test on additional architectures, especially HPPA B. Test to see that existing mutex operations still work correctly C. Exercise the known use case if possible D. Identify additional use cases There is one place in the code (marked with great big XXX) where I am simply unsure if I need to use a CAS-type operation for updating the reference count on the recursive mutex. I _think_ it is save to simply auto-increment the field, but I am totally willing for any of you experts to tell my why this might not be safe. :) I will be updating one of my home machines (amd64) to use this code in the next few days. I don't plan on making any commits until I've had some positive testing feedback, and hopefully some expert commentary on the XXX. (Some additional concensus that this is the right thing to do would also be appreciated!) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: sys/arch/alpha/include/mutex.h === RCS file: /cvsroot/src/sys/arch/alpha/include/mutex.h,v retrieving revision 1.4 diff -u -p -r1.4 mutex.h --- sys/arch/alpha/include/mutex.h 28 Apr 2008 20:23:11 - 1.4 +++ sys/arch/alpha/include/mutex.h 31 Jul 2010 22:34:30 - @@ -43,7 +43,11 @@ struct kmutex { struct kmutex { union { - volatile uintptr_t mtxa_owner; + struct { + volatile uintptr_t mtxa_owner; + volatile uint8_tmtxa_recurse; + volatile uint8_tmtxa_unused[3]; + } a; struct { volatile uint8_tmtxs_flags; ipl_cookie_tmtxs_ipl; @@ -53,7 +57,8 @@ struct kmutex { __cpu_simple_lock_t mtx_lock; }; -#definemtx_owner u.mtxa_owner +#definemtx_owner u.a.mtxa_owner +#definemtx_recurse u.a.mtxa_recurse #definemtx_flags u.s.mtxs_flags #definemtx_ipl u.s.mtxs_ipl Index: sys/arch/arm/include/mutex.h === RCS file: /cvsroot/src/sys/arch/arm/include/mutex.h,v retrieving revision 1.10 diff -u -p -r1.10 mutex.h --- sys/arch/arm/include/mutex.h28 Apr 2008 20:23:14 - 1.10 +++ sys/arch/arm/include/mutex.h31 Jul 2010 22:34:31 - @@ -57,7 +57,11 @@ struct kmutex { struct kmutex { union { /* Adaptive mutex
Re: Modules loading modules?
On Sun, 1 Aug 2010, Antti Kantee wrote: Well, folks, here is a first pass recursive locks! The attached diffs are against -current as of a few minutes ago. Oh, heh, I thought we have recursive lock support. But with that gone from the vfs locks, I guess not (apart from the kernel lock ;). :) I'm not sure if it's a good idea to change the size of kmutex_t. I guess plenty of data structures have carefully been adjusted by hand to its size and I don't know of any automatic way to recalculate that stuff. Even if not, since this is the only user and we probably won't have that many of them even in the future, why not just define a new type ``rmutex'' which contains a kmutex, an owner and the counter? It feels wrong to punish all the normal kmutex users for just one use. It'll also make the implementation a lot simpler to test, since it's purely MI. separate normal case and worst case Good point, and it will be a lot less work, too! :) And it solves the problem of not permitting a rmutex being used with condvars. One question: Since an adaptive kmutex_t already includes an owner field, would we really need to have another copy of it in the rmutex_t structure? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Sun, 1 Aug 2010, Antti Kantee wrote: I'm not sure if it's a good idea to change the size of kmutex_t. I guess plenty of data structures have carefully been adjusted by hand to its size and I don't know of any automatic way to recalculate that stuff. Even if not, since this is the only user and we probably won't have that many of them even in the future, why not just define a new type ``rmutex'' which contains a kmutex, an owner and the counter? It feels wrong to punish all the normal kmutex users for just one use. It'll also make the implementation a lot simpler to test, since it's purely MI. separate normal case and worst case Round two! Taking pooka's suggestion, this version is built on top of (rather than beside) the existing non-recursive mutex. As such, it does not affect any MD code. Attached is a set of diffs that 1. Adds sys/sys/rmutex.h and sys/kern/kern_rmutex.c to implement recursive adaptive mutexes. (Conspicuously missing is an rmutex(9) man page... It will happen before this gets committed.) 2. Converts the existing module_lock from a normal kmutex_t to an rmutex_t 3. Updates all of the (surprisingly many) places where module_lock is acquired. Compile-tested on port-amd64 (including rumptest). Since there are no MD-changes in this version, there shouldn't be any issues with building on other ports. As previously noted, there is only one known use case for this so far: modules loading other modules from within their xxx_modcmd() routine. The specific use case we have involves loading the acpicpu driver/module which eventually results in an attempt to load acpiverbose. It would be really nice if the community could A. Compile-test on additional architectures B. Test to see that existing mutex operations still work correctly C. Exercise the known use case if possible D. Identify additional use cases - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: sys/conf/files === RCS file: /cvsroot/src/sys/conf/files,v retrieving revision 1.992 diff -u -p -r1.992 files --- sys/conf/files 7 Jul 2010 01:09:39 - 1.992 +++ sys/conf/files 2 Aug 2010 00:08:26 - @@ -1468,6 +1468,7 @@ file kern/kern_prot.c file kern/kern_ras.c file kern/kern_rate.c file kern/kern_resource.c +file kern/kern_rmutex.c file kern/kern_runq.c file kern/kern_rwlock.c file kern/kern_rwlock_obj.c Index: sys/sys/rmutex.h === RCS file: sys/sys/rmutex.h diff -N sys/sys/rmutex.h --- /dev/null 1 Jan 1970 00:00:00 - +++ sys/sys/rmutex.h2 Aug 2010 00:08:27 - @@ -0,0 +1,53 @@ +/* $NetBSD$*/ + +/*- + * Copyright (c) 2010 The NetBSD Foundation, Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef_SYS_RMUTEX_H_ +#define_SYS_RMUTEX_H_ + +#include sys/mutex.h + +/* + * A recursive mutex is simply an adaptive mutex which can be acquired + * multiple times by a single owner. + */ + +struct rmutex { + kmutex_trmtx_mtx; + uint32_trmtx_recurse; +}; + +typedef struct rmutex rmutex_t; + +void rmutex_init(rmutex_t *); +void rmutex_destroy(rmutex_t *); +void rmutex_enter
Re: Modules loading modules?
On Mon, 2 Aug 2010, Quentin Garnier wrote: +int +rmutex_tryenter(rmutex_t *rmtx) +{ + int rv = 1; + + if (mutex_owned(rmtx-rmtx_mtx)) { + rmtx-rmtx_recurse++; + KASSERT(rmtx-rmtx_recurse != 0); + } else if ((rv = mutex_tryenter(rmtx-rmtx_mtx)) != 0) { + rmtx-rmtx_recurse++; + KASSERT(rmtx-rmtx_recurse != 0); + } + return rv; +} I am probably not getting the idea, but I fail to see how this qualifies as a mutex. My reading of this piece of code is that rmutex_enter() is always going to succeed immediately. According to the mutex(9) man page: mutex_owned(mtx) For adaptive mutexes, return non-zero if the current LWP holds the mutex. ... The rmutex_init() code ensures that we have an adaptive mutex. So what this says is If _we_ already own the mutex, increment counter (rv is still initialized to 1) Else if we can acquire the mutex (updating rv) increment the counter Return Please correct me if I'm wrong--and I am very probably wrong--, but isn't the point of a recursive mutex to allow the thread initially taking the mutex to take it again, but not other threads? The code would have to be a lot more complicated than that. All I see here is a fancy no-op, but maybe it's just me. It would be a no-op for a recursive spin mutex. That's why I didn't allow them! :) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Sun, 1 Aug 2010, John Nemeth wrote: I'm thinking the acquisition of module_lock should be pushed into module_autoload() instead of having the caller acquire it for this very reason. It makes it hard to change the way locking works in the MODULAR code if you expect the caller to acquire the lock. I don't know why it was done this way originally, or what the consequences (if any) would be for making the change. Andrew, any thoughts on this? I was thinking the same thing. There is one case in kern/kern_exec.c where the lock was held while the code loops through a list of possible modules. Looking closer, that code actually releases the lock and yield()s before re-acquiring it within the loop, so it probably could be modified without any adverse effects. This would actually simplify the interface, since ALL of the routines would be doing their own locking, and noone in the rest of the kernel would need to know about the lock. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Mon, 2 Aug 2010, Masao Uebayashi wrote: mutex_owned(mtx) ... mutex_owned() is provided for making diagnostic checks to verify that a lock is held. For example: KASSERT(mutex_owned(driver_lock)); It should not be used to make locking decisions at run time, or to verify that a lock is not held. Thus you can't use it for your decision. Yes, I did read the mutex(9) man page. However, your assertion is at odds with pooka's suggestion, from http://mail-index.netbsd.org/tech-kern/2010/08/01/msg008632.html Good point. I think it's ok to do: if (mutex_owned(mtx)) cnt++ else lock - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Sun, 1 Aug 2010, Eduardo Horvath wrote: Why is there a need to hold a mutex across module loading instead of, say, having a reference count tht's protected by a mutex? I thought holding a mutex across a potentially blocking operation such as a module load is considered bad practice. Please note that I am not introducing the holding of the lock over the potentially blocking operation. This is already done. Several others have already indicated that the atomicity of the module load (including the loading of any required modules) is a desirable property. I'm simply looking for a mechanism that allows us to keep this property while allowing modules to load other modules. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Locking for module_autoload() (was: Modules loading modules?)
On Mon, 2 Aug 2010, Adam Hamsik wrote: On Aug,Monday 2 2010, at 4:27 AM, Paul Goyette wrote: On Sun, 1 Aug 2010, John Nemeth wrote: I'm thinking the acquisition of module_lock should be pushed into module_autoload() instead of having the caller acquire it for this very reason. It makes it hard to change the way locking works in the MODULAR code if you expect the caller to acquire the lock. I don't know why it was done this way originally, or what the consequences (if any) would be for making the change. Andrew, any thoughts on this? Attached are diffs for moving the locking out of each caller and into module_autoload() itself. Most of these changes are fairly trivial, but a couple of them should be looked at more closely in case I've missed any subtleties in the original code: kern/kern_syscall.c net/if_ppp.c I think that this patch has same problem as original code you can call module_autoload twice in same lwp. (you are holding module_lock while calling module_do_load). This patch wasn;t intended to solve the nested-call problem. This patch only purpose was to modify the existing semantics of the module_autoload routine to match that of the other module routines. In particular, module_autoload is the only routine that requires its caller to acquire the mutex first; all the other routines do all the required locking internally to each routine. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Mon, 2 Aug 2010, Antti Kantee wrote: On Mon Aug 02 2010 at 16:30:03 +1000, matthew green wrote: this is an incomplete reading of the manual page, and you can not use mutex_owned() the way you are trying to (regardless of what pooka posted.) you can't even using it in various forms of assertions safely. from the man page: It should not be used to make locking decisions at run time, or to verify that a lock is not held. That's the mantra, yes. ie, you can not even KASSERT(!mutex_owned()). Strictly speaking you can in a case where you have two locks always are taken as l1,l2 and released as l2,l1 provided you're not dealing with a spin mutex. Does it make any sense? no (l2 is useless). Is it possible? yes. Now, on to sensible stuff. I'm quite certain that warning was written to make people avoid writing bad code without understanding locking -- if you need to used mutex_owned() to decide if you should lock a normal mutex, your code is broken. The other less likely possibility is that someone plans to change mutex_owned in the future. Further data point: the same warning is in rw_held, yet it was used to implement recursive locking in vlockmgr until its very recent demise. Ignoring man page mantras and focusing on how the code works, I do not see anything wrong with Paul's use of mutex_owned(). but i'm still not sure why we're going to such lengths to hold a lock across such a heavy weight operation like loading a module. that may involve disk speeds, and so you're looking at waiting millions of cycles for the lock. aka, what eeh posted. It's held for millions of cycles already now and nobody has pointed out measurable problems. But if it is deemed necessary, you can certainly hide a cv underneath. The efficiency requirements for the module lock are probably anyway in the who cares wins ... not spectrum. At least I'm not aware of any fastpath wanting to use it. Anyway, no real opinion there. A cv most likely is the safe, no-brainer choice. Yes, the cv mechanism, coupled with changing the semantics of module_autoload() (to not require the caller to obtain the lock) makes good sense here. I'm working on it and I should be able to post new diffs tomorrow. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Mon, 2 Aug 2010, Paul Goyette wrote: Yes, the cv mechanism, coupled with changing the semantics of module_autoload() (to not require the caller to obtain the lock) makes good sense here. I'm working on it and I should be able to post new diffs tomorrow. OK, it's time for Round #3! Attached are diffs that implement Adam's suggested cv-based solution, along with moving the locking for module_autolock() into the routine itself (for consistency with other routines). I've also updated the module(4) man page this time around. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | - MODULE_LOCKING.diff.gz Description: Binary data
re: Modules loading modules?
The most recent verion of this patch set was recently posted, and it uses the cv_wait() mechanism suggested by Adam Hamsik. This discussion is still very intresting, but perhaps we could start a new thread (or at least change the subject line)? :) On Tue, 3 Aug 2010, matthew green wrote: On Mon Aug 02 2010 at 16:30:03 +1000, matthew green wrote: this is an incomplete reading of the manual page, and you can not use mutex_owned() the way you are trying to (regardless of what pooka posted.) you can't even using it in various forms of assertions safely. from the man page: It should not be used to make locking decisions at run time, or to verify that a lock is not held. That's the mantra, yes. ie, you can not even KASSERT(!mutex_owned()). Strictly speaking you can in a case where you have two locks always are taken as l1,l2 and released as l2,l1 provided you're not dealing with a spin mutex. Does it make any sense? no (l2 is useless). Is it possible? yes. what is not possible is to safely to what the manual says. you will not have the right thing happen. i've seen this when i've been working on the sparc64 pmap. Now, on to sensible stuff. I'm quite certain that warning was written to make people avoid writing bad code without understanding locking -- if you need to used mutex_owned() to decide if you should lock a normal mutex, your code is broken. The other less likely possibility is that someone plans to change mutex_owned in the future. Further data point: the same warning is in rw_held, yet it was used to implement recursive locking in vlockmgr until its very recent demise. Ignoring man page mantras and focusing on how the code works, I do not see anything wrong with Paul's use of mutex_owned(). this just does not match my actual experience in the kernel. i had weird pmap-style problems and asserts firing wrongly while i did not obey the rules in the manual directly. .mrg. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Modules loading modules?
On Tue, 3 Aug 2010, haad wrote: Well, still no hurry, but it would be nice if some additional eyes were pointed this way. I've got the recursive-module-load test case added to the existing tests/modules/ stuff (and it works)! BTW, since this is changing the kernel ABI, I guess it will need a version bump. Do we have any other bumps coming soon that we can share? I talked with Andrew today and he said that basicaly we can use mutex_owned for this particular case. That should make your code much simpler :) sorry for any additional work :) Well, we've come full circle! :) It seems to me that, even if it is OK to use mutex_owned() for this particular case, perhaps we should not. Most everyone seemed to agree that your condvar solution was technically correct, and if we did use the mutex_owned() solution it would only serve as a temptation for someone else in the future! We now have two solutions, both of which actually work (tested on my home machine). One solution (mutex_owned) might be slightly simpler and slightly more efficient than the other (condvar), but this section of code is certainly not performance-critical. Therefore, is there any reason for preferring the mutex_owned solution? I'm happy to go either way based on concensus here, but I'm leaning towards the condvar solution. It doesn't require any exemptions or long explanations of why it is OK to use mutex_owned in ways that would appear to violate our own documentation. And it also addresses eeh's (and mrg's) concerns of keeping the lock across potentially long-duration operations. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Wed, 4 Aug 2010, Andrew Doran wrote: Sorry for not replying sooner. Please don't add a generic recursive lock facility, it would be abused. Yeah! My current patches have no new generic facilities at all. I'd like something roughly like the following. I think this should also cover major configuration tasks such as device attach and detach, so it wouldn't really be module specific. The naming is a poor suggestion. void sysconfig_lock(void) { if (mutex_owned(sysconfig_mutex)) { /* * At this point it's OK for the thread to hold other * locks, since the system configuration lock was the * first taken on the way in. */ sysconfig_recurse++; return; } /* * I don't remember the correct argument sequence to the * following, but basically we need to assert that NO locks, * sleep or spin, other than kernel_lock are held, * as the system configuration lock would be the outermost * lock in the system. */ LOCKDEBUG_BARRIER(...); mutex_enter(sysconfig_mutex); KASSERT(sysconfig_recurse == 0); } For the boot autoconfig path where you have things being driven by the kernel, this would work OK. Situations where there's a kthread or something attaching and detaching devices on the fly are a bit different, since they likely have local atomicity requirements (need to take device private locks). There you'd probably need the caller to take the lock. USB comes to mind (along with some hardware RAID drivers that do hotplug attach/detach). Thoughts? Well, at first glance, this proposal makes sense. But it is certainly a much larger task than dealing with the immediate problem - modules which want to load other modules. So, I really have three questions: 1. In keeping with earlier concerns about holding locks for long periods of time (eeh and mrg both commented on this), the approach described above would seem to have an even longer hold-time than the current module_lock. Would not something similar to haad's condvar approach be appropriate for this proposal, as well as for the more-limited-in- scope recursive module lock? 2. Would it be reasonable to solve the immediate problem as previously proposed, rather than waiting for this ultimate solution? I think it would be a long time before we could find and resolve all of the issues that might be created in the various threaded situations that may exist. I know I'm certainly not sufficiently qualified to identify all those situations, and I also know that I don't have sufficient available work-hours to do this in any reasonable time- frame. I'd still like to move forward with the most recent solution to the module_lock problem. 3. Since we're still technically abusing the mutex(9) man-page caveat about using mutex_owned() for making locking decisions, it would be very much appreciated (at least by myself) to have an explanation of WHY it is OK in this case to do it here, but not somewhere else. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Module and device configuration locking [was Re: Modules loading modules?]
OK, got it. I'll have another set of patches in a few days. On Thu, 5 Aug 2010, Andrew Doran wrote: On Wed, Aug 04, 2010 at 10:28:56AM -0700, Paul Goyette wrote: On Wed, 4 Aug 2010, Andrew Doran wrote: Sorry for not replying sooner. Please don't add a generic recursive lock facility, it would be abused. Yeah! My current patches have no new generic facilities at all. I'd like something roughly like the following. I think this should also cover major configuration tasks such as device attach and detach, so it wouldn't really be module specific. The naming is a poor suggestion. void sysconfig_lock(void) { if (mutex_owned(sysconfig_mutex)) { /* * At this point it's OK for the thread to hold other * locks, since the system configuration lock was the * first taken on the way in. */ sysconfig_recurse++; return; } /* * I don't remember the correct argument sequence to the * following, but basically we need to assert that NO locks, * sleep or spin, other than kernel_lock are held, * as the system configuration lock would be the outermost * lock in the system. */ LOCKDEBUG_BARRIER(...); mutex_enter(sysconfig_mutex); KASSERT(sysconfig_recurse == 0); } For the boot autoconfig path where you have things being driven by the kernel, this would work OK. Situations where there's a kthread or something attaching and detaching devices on the fly are a bit different, since they likely have local atomicity requirements (need to take device private locks). There you'd probably need the caller to take the lock. USB comes to mind (along with some hardware RAID drivers that do hotplug attach/detach). Thoughts? Well, at first glance, this proposal makes sense. But it is certainly a much larger task than dealing with the immediate problem - modules which want to load other modules. So, I really have three questions: 1. In keeping with earlier concerns about holding locks for long periods of time (eeh and mrg both commented on this), the approach described For module lock the wait time isn't really a big concern. Module load and unload is a heavyweight operation so the wait time is expected. There is nothing intrinsically wrong with holding a mutex for a long time provided that you've got a good handle on the potential side effects. It's a different story for other locks in the system like say proc_lock, which can't be held for a long time without disrupting the user experience and/or deadlocking the system. above would seem to have an even longer hold-time than the current module_lock. Would not something similar to haad's condvar approach be appropriate for this proposal, as well as for the more-limited-in- scope recursive module lock? condvar gives you another lock in a roundabout way, without priority inheritance to help move things along if something has clogged the pipework. :-). So I don't see benefit over a mutex. 2. Would it be reasonable to solve the immediate problem as previously proposed, rather than waiting for this ultimate solution? I think it would be a long time before we could find and resolve all of the issues that might be created in the various threaded situations that may exist. I know I'm certainly not sufficiently qualified to identify all those situations, and I also know that I don't have sufficient available work-hours to do this in any reasonable time- frame. I'd still like to move forward with the most recent solution to the module_lock problem. I'm not suggesting that you need to tackle autoconfiguration locking at the same time.. What I'm suggesting is to put the primitives in place (say in kern_lock.c or something) and then use these for the benefit of the modules code. It would provide a statment of direction and avoid re-hashing things later when somebody decides to tackle autoconf. Sorry for being unclear. 3. Since we're still technically abusing the mutex(9) man-page caveat about using mutex_owned() for making locking decisions, it would be very much appreciated (at least by myself) to have an explanation of WHY it is OK in this case to do it here, but not somewhere else. Ok well I think the simplest course of action would be to replace the mutex_owned() with a global variable. It would do much the same thing but for the price of 4 or 8 bytes there is no question of contradicting the mantra. So like: /* * Ok to check this unlocked, as for the value to equal curlwp it must * have been set by the current thread of execution (i.e. not interrupt * context or another LWP). */ l = curlwp; if (sysconfig_lwp == l) { /* recurse */ } else { mutex_enter(sysconfig_lock); sysconfig_lwp = l; /* etc etc
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Thu, 5 Aug 2010, Paul Goyette wrote: OK, got it. I'll have another set of patches in a few days. Well, it was a slow day at the office today, so I found some time to work on this! :) Attached is the latest version of this change. For simplicity, I have broken the patches up into five separate groups: Part 1 defines a set of new kernel locking primitives in file kern/kern_lock.c to implement the recursive kernconfig_mutex. (I'm not really stuck on the name, so open to alternative suggestions!) All of the locking in sys/kern_module.c has been updated to use this instead of the internal module_lock mutex. Additionally, the module_autoload() routine now does its own locking, rather than requiring its caller to do it. And the description of locking in the module(9) man page is updated. Part 2 removes all of the explicit module_{,un}lock() calls from the various xxx_verbose modules that I'd previous worked on. Part 3 removes all of the explicit module_{,un}lock() calls from other bits and pieces of the kernel. In a couple of places, new calls to kernconfig_{,un}lock() are inserted. Part 4 updates rump to provide equivalent locking routines. (pooka, I would appreciate you looking at this.) Part 5 adds a new atf test-case to the existing module tests to verify that recursion actually works! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: sys/sys/param.h === RCS file: /cvsroot/src/sys/sys/param.h,v retrieving revision 1.373 diff -u -p -r1.373 param.h --- sys/sys/param.h 28 Jul 2010 11:03:47 - 1.373 +++ sys/sys/param.h 6 Aug 2010 00:46:05 - @@ -63,7 +63,7 @@ * 2.99.9 (299000900) */ -#define__NetBSD_Version__ 599003800 /* NetBSD 5.99.38 */ +#define__NetBSD_Version__ 599003900 /* NetBSD 5.99.39 */ #define __NetBSD_Prereq__(M,m,p) (M) * 1) + \ (m) * 100) + (p) * 100) = __NetBSD_Version__) Index: sys/sys/systm.h === RCS file: /cvsroot/src/sys/sys/systm.h,v retrieving revision 1.239 diff -u -p -r1.239 systm.h --- sys/sys/systm.h 31 Jan 2010 02:04:43 - 1.239 +++ sys/sys/systm.h 6 Aug 2010 00:46:06 - @@ -492,6 +492,11 @@ void kernel_lock_init(void); void _kernel_lock(int); void _kernel_unlock(int, int *); +void kernconfig_lock_init(void); +void kernconfig_lock(void); +void kernconfig_unlock(void); +bool kernconfig_is_held(void); + #if defined(MULTIPROCESSOR) || defined(_MODULE) #defineKERNEL_LOCK(count, lwp) \ do { \ Index: sys/sys/module.h === RCS file: /cvsroot/src/sys/sys/module.h,v retrieving revision 1.24 diff -u -p -r1.24 module.h --- sys/sys/module.h26 Jun 2010 07:23:57 - 1.24 +++ sys/sys/module.h6 Aug 2010 00:46:07 - @@ -115,7 +115,6 @@ __link_set_add_rodata(modules, name##_mo TAILQ_HEAD(modlist, module); extern struct vm_map *module_map; -extern kmutex_tmodule_lock; extern u_int module_count; extern u_int module_builtinlist; extern struct modlist module_list; Index: sys/kern/kern_lock.c === RCS file: /cvsroot/src/sys/kern/kern_lock.c,v retrieving revision 1.150 diff -u -p -r1.150 kern_lock.c --- sys/kern/kern_lock.c20 Dec 2009 20:42:23 - 1.150 +++ sys/kern/kern_lock.c6 Aug 2010 00:46:07 - @@ -39,6 +39,7 @@ __KERNEL_RCSID(0, $NetBSD: kern_lock.c, #include sys/systm.h #include sys/kernel.h #include sys/lockdebug.h +#include sys/mutex.h #include sys/cpu.h #include sys/syslog.h #include sys/atomic.h @@ -56,6 +57,10 @@ bool kernel_lock_dodebug; __cpu_simple_lock_t kernel_lock[CACHE_LINE_SIZE / sizeof(__cpu_simple_lock_t)] __aligned(CACHE_LINE_SIZE); +static kmutex_t kernconfig_mutex; +static lwp_t *kernconfig_lwp; +static int kernconfig_recurse; + void assert_sleepable(void) { @@ -306,3 +311,59 @@ _kernel_unlock(int nlocks, int *countp) if (countp != NULL) *countp = olocks; } + +/* + * Functions for manipulating the kernel configuration lock. This + * recursive lock should be used to protect all additions and removals + * of of kernel functionality, such as device configuration and loading + * of modular kernel components. + */ + +void +kernconfig_lock_init(void
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Fri, 6 Aug 2010, Mindaugas Rasiukevicius wrote: Hello, Paul Goyette p...@whooppee.com wrote: Well, it was a slow day at the office today, so I found some time to work on this! :) Attached is the latest version of this change. For simplicity, I have broken the patches up into five separate groups: ... Few comments on the patch: -#define__NetBSD_Version__ 599003800 /* NetBSD 5.99.38 */ +#define__NetBSD_Version__ 599003900 /* NetBSD 5.99.39 */ Why bumping? Seemed like the right thing to do, since the locking protocol is changing, as well as the global module_lock mutex disappearing. But I really can't hink of any ill effects, unless someone has a module that suddenly fails link-and-load because it references module_lock. It's not very likely. I'll defer, or I'll wait to ride someone else's bump just to be safe. + * of of kernel functionality, such as device configuration and loading One of too much. Fixed. +void +kernconfig_lock(void) +{ + lwp_t *l; + + /* +* OK to check this unlocked, since it could only be set to +* curlwp by the current thread itself, and not by an interrupt +* or any other LWP. +*/ Please add KASSERT(!cpu_intr_p()); Done. I was worried about this, and forgot to ask. Thanks for picking it up. + l = curlwp; + if (kernconfig_lwp == l) { + kernconfig_recurse++; + KASSERT(kernconfig_recurse != 0); Such (++ then != 0) assert is a bit useless. How about: KASSERT(kernconfig_recurse 1); The intent here was to ensure that we didn't overflow the counter back to zero. It might have made sense when the counter was a uint8_t, but probably not with an int! It can probably be removed completely. + } else { + mutex_enter(kernconfig_mutex); + kernconfig_lwp = l; + kernconfig_recurse = 1; + } +} Somebody will confuse l with 1. :) This came straight from ad@'s suggestions, and there seems to be a lot of precedent in the sys/kern/ code to use l. But I will select a better variable name. + if (--kernconfig_recurse == 0) { + kernconfig_lwp = 0; + mutex_exit(kernconfig_mutex); kernconfig_lwp = NULL; Ooops! :) +bool +kernconfig_is_held(void) +{ + + return (mutex_owned(kernconfig_mutex)); No need for brackets around function. Got it. + KASSERT(kernconfig_is_held); But missing brackets in few cases. Yeah, you're the second one to point that out. I fixed them once, not sure how they crept back in. But they've been fixed again. Otherwise seems OK to me. Thanks. Thank you for the detailed look. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Thu, 5 Aug 2010, Paul Goyette wrote: On Thu, 5 Aug 2010, Paul Goyette wrote: OK, got it. I'll have another set of patches in a few days. Well, it was a slow day at the office today, so I found some time to work on this! :) Attached is the latest version of this change. For simplicity, I have broken the patches up into five separate groups: Part 1 defines a set of new kernel locking primitives in file kern/kern_lock.c to implement the recursive kernconfig_mutex. (I'm not really stuck on the name, so open to alternative suggestions!) All of the locking in sys/kern_module.c has been updated to use this instead of the internal module_lock mutex. Additionally, the module_autoload() routine now does its own locking, rather than requiring its caller to do it. And the description of locking in the module(9) man page is updated. Part 2 removes all of the explicit module_{,un}lock() calls from the various xxx_verbose modules that I'd previous worked on. Part 3 removes all of the explicit module_{,un}lock() calls from other bits and pieces of the kernel. In a couple of places, new calls to kernconfig_{,un}lock() are inserted. Part 4 updates rump to provide equivalent locking routines. (pooka, I would appreciate you looking at this.) Part 5 adds a new atf test-case to the existing module tests to verify that recursion actually works! Following up on all the various comments from jmcneil@, pooka@, and rmind@, I've attached a revised set of diffs. The most significant change between this and the previous revision is the separation of the kernconfig_lock_*() stuff into a separate kern_cfg.c source file. This allows inclusion of the kernel code directly into rump, rather than having to re-implement it in rump/klock.c - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | - ML.diff.tgz Description: Binary data
Length of wmesg for condvar?
The condvar(9) man page states The wmesg argument specifies a string of no more than 8 characters... yet there is at least one instance in the kernel sources where we have a wmesg long than 8 characters! In src/sys/kern/kern_module.c in routine module_init() we have cv_init(module_thread_cv, modunload); There appear to be several other similar violations in various places: src/sys/arch/x86/x86/ipmi.c src/sys/arch/xen/xen/balloon.c src/sys/dev/isa/fd.c src/sys/dev/tprof/tprof.c src/sys/dist/ipf/netinet/ip_auth.c src/sys/dist/ipf/netinet/ip_log.c src/sys/dist/ipf/netinet/ip_sync.c src/sys/fs/nilfs/nilfs_subr.c src/sys/fs/nilfs/nilfs_vfsops.c src/sys/kern/sys_pipe.c src/sys/net/agr/if_agr.c src/sys/opencrypto/crypto.c src/sys/rump/librump/rumpkern/sysproxy_socket.c Should these be changed? Are there any adverse effects from having a wmesg longer than 8 characters? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Length of wmesg for condvar?
On Mon, 9 Aug 2010, Matthew Mondor wrote: On Sun, 8 Aug 2010 17:23:23 -0700 (PDT) Paul Goyette p...@whooppee.com wrote: Should these be changed? Are there any adverse effects from having a wmesg longer than 8 characters? It seems to me that the exporter of those use strncpy() (i.e. kern/init_sysctl.c) and that the structures use WMESGLEN and KI_WMESGLEN both defined as 8. So other than inadvertently truncated names it at least should not cause corruption, but I think that truncated names could also be problematic when trying to distinguish two strings starting with the same 8 characters (is that likely now)? Especially when the only thing that differs between two states is some suffix like rd and rw... After all, those are intended for humans :) Does anyone object to my going through and coming up with shorter names (= 8 chars) for these condvars? I'll leave changing the exporters from strncpy() to strlcpy() for another day. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Module and device configuration locking [was Re: Modules loading modules?]
Updating the status on these changes: One comment questioned whether or not a version bump was required, and I've more-or-less convinced myself it is at least desired. While properly-working modules from the pre-update will continue to work on a post-update kernel, the reverse is not necessarily true. A module written for a post-update kernel and which takes advantage of the changed locking protocol will fail on a pre-update kernel. Another comment suggested that the name of newly-created file kern/kern_cfg.c should be changed to more closely match its contents, while an earlier comment had suggested generalizing the filename! I've taken the more-specific path and the file is now called kern_cfglock.c If other stuff gets added to this file later, we can come up with a more generic name at that time. Some additional changes have been included in this latest set of diffs. Mostly, there were some KASSERT()s sprinkled throughout that tried to ensure that the code had not been called recursively. Since recursion is now explicitly supported, all of the module_active stuff has been reworked. Additionally, there was a statically-allocated list of pending modules that needed to have their initialization completed. This has been changed to keep multiple stack-allocated lists, one for each depth. I'm planning to commit these changes next weekend, unless there are any significant objections. If anyone else needs to coordinate changes in order to ride-the-version-bump, let me know. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | - ML.diff.tgz Description: Binary data
Re: kernel module loading vs securelevel
If I remember correctly, autoload bypasses the authorization calls within the module subsystem. Autoload also works ONLY for modules that are either built-in, passed by the bootloader, or located in the official directory; autoloading from the filesystem is prohibited if the pathname contains any '/'. On Sat, 16 Oct 2010, Izumi Tsutsui wrote: XXX: module files can be loaded only on single user? It looks kernel modules can't be loaded on multi user, i.e. if kernel securelevel is 1, unless options INSECURE is specified. i386 has options INSECURE by default so it just works, but is it intended feature? It would seem to be intentional. After all, kernel modules can do all sorts of nasty things if they want to. In that case, module autoload/autounload is not functional at all and we have to specify all possible necessary modules explicitly during boot time?? --- Izumi Tsutsui !DSPAM:4cb91b7e2433872420069! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: kernel module loading vs securelevel
On Sat, 16 Oct 2010, Izumi Tsutsui wrote: autoload/autounload does NOT perform any authorization checks - please look at the code! No checking of securelevel occurs, as far as I can see. For autoload, the module name must not contain a '/', so if the module is being loaded from the file system it must be loaded from the blessed /stand/${ARCH}/${VERSION}/modules directory. Including the INSECURE option will have no effect on autoloading of modules. Hmm. I built MODULAR kernels on news68k and sun3 (which didn't have INSECURE) but I couldn't use TMPFS or execute a.out binaries on multiuser though they worked after shutdown(8) or on single user. The code doesn't work as intended and just we should fix it? Hmmm. Maybe I am reading the code wrong. But the intent of the code seems to be quite clear. The manual load explicitly calls kauth_...() while the auto-load path does not make any such call. Perhaps there is an additional authorization call in kobj_load_vfs() (which does the actual loading). A quick grep of subr_kobj*.c for kauth_ does not reveal anything obvious. Could you rerun your testing after setting sysctl kern.module.verbose? This should provide extra kernel debug printf() messages... - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: kernel module loading vs securelevel
Hmmm. OK, I will build an i386 kernel _without_ INSECURE and see if I can reproduce the problem. On Sat, 16 Oct 2010, Izumi Tsutsui wrote: Could you rerun your testing after setting sysctl kern.module.verbose? This should provide extra kernel debug printf() messages... There are some info during boot (before securelevel is set to 1): --- : Mounting all filesystems... DEBUG: module: plist load returned error 2 for `/stand/sun3/5.99.39/modules/mfs/mfs.kmod' DEBUG: module: plist load returned error 2 for `/stand/sun3/5.99.39/modules/kernfs/kernfs.kmod' DEBUG: module: plist load returned error 2 for `/stand/sun3/5.99.39/modules/procfs/procfs.kmod' DEBUG: module: plist load returned error 2 for `/stand/sun3/5.99.39/modules/ptyfs/ptyfs.kmod' DEBUG: module: cannot unload module `mfs' error=16 DEBUG: module: cannot unload module `kernfs' error=16 DEBUG: module: cannot unload module `procfs' error=16 Clearing temporary files. DEBUG: module: cannot unload module `ptyfs' error=16 Starting amd. Creating a.out runtime link editor directory cache. Checking quotas: done. Setting securelevel: kern.securelevel: 0 - 1 : --- All file systems (mfs, kernfs, procfs, and ptyfs) are mounted properly, though. On multi user, there is no particular info: --- # mount -t tmpfs tmpfs /mnt mount_tmpfs: tmpfs on /mnt: Operation not supported by device # shutdown now Shutdown NOW! : Enter pathname of shell or RETURN for /bin/sh: Terminal type is kterm. We recommend creating a non-root account and using su(1) for root access. # mount -t tmpfs tmpfs /mnt DEBUG: module: plist load returned error 2 for `/stand/sun3/5.99.39/modules/tmpfs/tmpfs.kmod' chariot# df /mnt Filesystem 1K-blocks Used Avail %Cap Mounted on tmpfs 49480 8 49472 0% /mnt # DEBUG: module: cannot unload module `tmpfs' error=16 --- Izumi Tsutsui !DSPAM:4cb9a96b2437545078506! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: kernel module loading vs securelevel
On Sat, 16 Oct 2010, John Nemeth wrote: sys/kern/kern_module.c:module_autoload() makes almost the exact same call to kauth_authorize_system as does module_load(). The difference is that the second last parm is (void *)(uintptr_t)1. What difference this makes is going to be buried in the bowels of kauth, and I'm not going to dig through that at this moment. Hmmm, missed that one. You're right. The arg is, I think, the one that gets passsed to the kauth listener as arg2, which should cause us to return KAUTH_RESULT_ALLOW for autoload: static int module_listener_cb(kauth_cred_t cred, kauth_action_t action, void *cookie, void *arg0, void *arg1, void *arg2, void *arg3) { int result; result = KAUTH_RESULT_DEFER; if (action != KAUTH_SYSTEM_MODULE) return result; if ((uintptr_t)arg2 != 0) /* autoload */ result = KAUTH_RESULT_ALLOW; return result; } }-- End of excerpt from Paul Goyette !DSPAM:4cb9cb6e2431088414278! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: kernel module loading vs securelevel
On Sun, 17 Oct 2010, Masao Uebayashi wrote: On Sat, Oct 16, 2010 at 08:57:04AM -0700, John Nemeth wrote: On Jan 31, 5:14pm, Paul Goyette wrote: } On Sat, 16 Oct 2010, Izumi Tsutsui wrote: } } Hmm, what do you think about this feature? } Only available in INSECURE environment? } } We trust modules at the time when they're installed into the trusted } place, same as kernel itself. I think prohibiting module load at } run-time is rather pointless. } } Well I think the point is whether we should require INSECURE or not } to use module autoload/autounload after multiuser. } } If we should I'll enable options INSECURE by default on ports } that require options MODULAR (to save kernel file size). } } autoload/autounload does NOT perform any authorization checks - please } look at the code! No checking of securelevel occurs, as far as I can I just did and autoload most certainly does do authorization checks. } see. For autoload, the module name must not contain a '/', so if the } module is being loaded from the file system it must be loaded from the } blessed /stand/${ARCH}/${VERSION}/modules directory. Including the } INSECURE option will have no effect on autoloading of modules. } } Manual loading and unloading of modules does involve calls to } kauth_authorize_system() which will check securelevel. sys/kern/kern_module.c:module_autoload() makes almost the exact same call to kauth_authorize_system as does module_load(). The difference is that the second last parm is (void *)(uintptr_t)1. What difference this makes is going to be buried in the bowels of kauth, and I'm not going to dig through that at this moment. I read it as: a) module_listener_cb() checks the 2nd arg, if autoload, judge the auth as allow b) secmodel_securelevel_system_cb() denies the auth if (securelevel 0) regardless of autoload Kauth disallows the auth, because 0 listner(s) denied. Ah, I see. That definitely explains it. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: kernel module loading vs securelevel
On Sat, 16 Oct 2010, David Holland wrote: And also make the blessed directory itself immutable? :) As I recall the semantics of immutable are such that this isn't necessary to protect modules that are present at boot time (that is, they can't be unlinked/renamed/etc.), and if there are autoloadable modules whose names aren't present at boot time, they'll fail the check. I've already misread the code here once, but... As far as I can tell, each time a module_autoload call is made, if the module is neither built-in nor passed in by the boot loader, the code will attempt to load it via a call to kobj_load_vfs() which has path as an argument. It doesn't appear to me that there is any pre-approved list of acceptable objects that can be loaded from the file system. BTW, does the immutable flag prevent one from using an immutable directory as the mount-point for some other file system? Hmmm... - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: kernel module loading vs securelevel
On Sun, 17 Oct 2010, Izumi Tsutsui wrote: } I'm just asking if options INSECURE is mandaory to use autoloading, } not module/autoloading is secure/silly/boo or not. } } No. As far as I can tell, there's a bug in the relevant kauth listener, } at least in terms of the original intent of the author of the autoloading } code; the system scope kauth listener should return DEFER, not DENY. module_listener_cb() was added to kern_module.c in revision 1.51 by elad. The kauth_authorize_system() calls were added to kern_module.c by ad, but the respective commit log messages doesn't say anything about them, so the original intent of the author of the autoloading code (ad) is unclear. The following patch makes autoload works even on securelevel 0, but I'm not sure if it's correct and acceptable. If not, options INSECURE is the only way to enable it.. Based on the discussion regarding the numerous ways in which this could be abused, I would personally vote for requiring INSECURE. At least that way, things are pretty clear. If the proposed patch were used, then you would have only options MODULAR in the kernel config file, which is not at all clear about the security of the resulting kernel. Index: secmodel/securelevel/secmodel_securelevel.c === RCS file: /cvsroot/src/sys/secmodel/securelevel/secmodel_securelevel.c,v retrieving revision 1.20 diff -u -p -r1.20 secmodel_securelevel.c --- secmodel/securelevel/secmodel_securelevel.c 7 Oct 2009 01:06:57 - 1.20 +++ secmodel/securelevel/secmodel_securelevel.c 16 Oct 2010 22:15:11 - @@ -254,7 +254,7 @@ secmodel_securelevel_system_cb(kauth_cre break; case KAUTH_SYSTEM_MODULE: - if (securelevel 0) + if ((uintptr_t)arg2 == 0 securelevel 0) result = KAUTH_RESULT_DENY; break; --- Izumi Tsutsui !DSPAM:4cba24f22438312397761! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Getting rid of standard boiler plate in kernel man pages CODE REFERENCES
On Mon, 8 Nov 2010, David Holland wrote: Not from me! Although rather than /usr/src it might say the top level of the source tree. How about: [...] +Any paths are relative to the top level of the source tree (usually +.Pa /usr/src ) . Sounds fine. Or perhaps historically instead of usually :) Or often might work, too. No opinion on that :-) typically or frequently are two additional possibilities! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
ioctl(2) vs sys/ioctl.h
Is there some reason why there is a discrepancy in the definition of ioctl()? From man page ioctl(2) SYNOPSIS #include sys/ioctl.h int ioctl(int d, unsigned long request, void *argp); Yet, from sys/ioctl.h we have __BEGIN_DECLS int ioctl(int, unsigned long, ...); __END_DECLS - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: ioctl(2) vs sys/ioctl.h
On Sun, 19 Dec 2010, Christos Zoulas wrote: In article pine.neb.4.64.1012181604050.27...@quicky.whooppee.com, Paul Goyette p...@whooppee.com wrote: Is there some reason why there is a discrepancy in the definition of ioctl()? From man page ioctl(2) SYNOPSIS #include sys/ioctl.h int ioctl(int d, unsigned long request, void *argp); Yet, from sys/ioctl.h we have __BEGIN_DECLS int ioctl(int, unsigned long, ...); __END_DECLS Most of our ioctl's take pointer arguments. Some streams ioctls though take int arguments (ioctl(fd, I_FLUSH, FLUSHR) for example) and using void * as the argument would not compile cleanly. I think that we should not have void * in the man page either. Should the man page be updated to match reality? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: ps_strings refactoring compat32 fix
On Wed, 2 Mar 2011, Joerg Sonnenberger wrote: Hi all, all programs get the argument and environment placed onto the initial stack in a structure called ps_strings. NetBSD currently doesn't export the correct version of this structure for 32bit binaries as in it doesn't use the 32bit version of it. This breaks setproctitle(3) and other users of it. The attached patch refactors the access in procfs and kern.proc as well as making it honour 32bit. The Darwin code likely is still broken, if someone wants to use that, it should be easy to adopt the changes from copyin_procargs. Also attached are two programs to test the consistency of the structure and updating it (so that ps(1) can be used to ensure it works), but I will leave it to someone with more interest in atf(7) to turn them into full test cases. If the patch gets committed, I'll take on the job of atf-ifying the tests. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: covering set of PCI kernels?
I would be surprised if there are not some drivers in i386/conf/ALL that are not in GENERIC... On Thu, 31 Mar 2011, David Young wrote: Today I am trying to answer the question, what is the smallest set of kernel configurations to build, in order to compile all machine-dependent PCI code and all PCI drivers at least once? The answer I have come up with so far is, All GENERIC kernels for architectures having a pci/ directory, that is, these GENERIC kernels: % ls -d sys/arch/*/pci | sed 's/\/pci$//' | sed 's/$/\/conf\/GENERIC/' | \ xargs ls -d ls: sys/arch/algor/conf/GENERIC: No such file or directory ls: sys/arch/atari/conf/GENERIC: No such file or directory ls: sys/arch/powerpc/conf/GENERIC: No such file or directory ls: sys/arch/sgimips/conf/GENERIC: No such file or directory ls: sys/arch/x86/conf/GENERIC: No such file or directory sys/arch/alpha/conf/GENERIC sys/arch/ibmnws/conf/GENERIC sys/arch/arc/conf/GENERIC sys/arch/macppc/conf/GENERIC sys/arch/bebox/conf/GENERIC sys/arch/mvmeppc/conf/GENERIC sys/arch/cats/conf/GENERIC sys/arch/netwinder/conf/GENERIC sys/arch/cobalt/conf/GENERICsys/arch/ofppc/conf/GENERIC sys/arch/i386/conf/GENERIC sys/arch/prep/conf/GENERIC sys/arch/ia64/conf/GENERIC sys/arch/sandpoint/conf/GENERIC Plus some subset of the kernel configuration files in sys/arch/algor/conf/ sys/arch/atari/conf/ sys/arch/sgimips/conf/ (Which ones?) And an unknown number of powerpc configurations: evbppc, rs6000, what else? Dave -- David Young OJC Technologies dyo...@ojctech.com Urbana, IL * (217) 344-0444 x24 !DSPAM:4d94ed852444056616303! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Interrupt issues with amd64 and an IXSystems 1U server (5.1)
I had a similar issue on a SuperMicro motherboard. The attached patch is for -current but should not be too hard to adapt to 5.1 On Thu, 12 May 2011, Brian Buhrow wrote: Hello. I did try doing a diff of working and non-working kernels, but the differences weren't obvious to me. However, I think the problem is that the network chips are interrupting on ioapic1, where as everything else is interrupting on ioapic0. It looks like even though ioapic1 interrupts are showing up in vmstat -i output, they're not actually causing the drivers to get triggered. It looks like this is a potentially known issue, but I can't find any particular patches which address the problem. Anyone else ever seen times when interrupts from ioapic1 or higher don't get routed properly? -thanks -Brian On May 12, 11:54pm, Manuel Bouyer wrote: } Subject: Re: Interrupt issues with amd64 and an IXSystems 1U server (5.1) } On Thu, May 12, 2011 at 12:12:54PM -0700, Brian Buhrow wrote: } Hello. I've got anew system I'm building where I have interrupt } issues with with the GENERIC kernel. This is an amd64 system with } NetBSD-5.1 with sources as of about 3 weeks ago. } The INSTALL kernel works fine, but when I boot the GENERIC kernel, the } ethernet doesn't work. I get a bunch of device timeout messages. Both the } INSTALL and GENERIC kernels are built from the same 5.1 sources. } Anyone have ideas on why the INSTALL kernel works fine, but the } GENERIC kernel does not? They look to be the same to me in so far as the } INSTALL config includes the GENERIC config. I rebuilt the GENERIC kernel } to turn off MTRR support, as the INSTALL kernel does, but that doesn't } help, though it does remove the FIXME message about there being too many } MTRR devices on the system. } } Did you try to diff the dmesg of working and non-working kernels ? } } -- } Manuel Bouyer bou...@antioche.eu.org } NetBSD: 26 ans d'experience feront toujours la difference } -- -- End of excerpt from Manuel Bouyer !DSPAM:4dcc62302402595018236! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: src/sys/arch/x86/x86/mpacpi.c === RCS file: /cvsroot/src/sys/arch/x86/x86/mpacpi.c,v retrieving revision 1.91 diff -u -p -r1.91 mpacpi.c --- src/sys/arch/x86/x86/mpacpi.c 5 Apr 2011 13:17:04 - 1.91 +++ src/sys/arch/x86/x86/mpacpi.c 12 May 2011 23:04:06 - @@ -625,7 +625,9 @@ mpacpi_derive_bus(ACPI_HANDLE handle, st if (ACPI_FAILURE(rv)) goto out; - if (acpi_match_hid(devinfo, pciroot_hid)) { + if (acpi_match_hid(devinfo, pciroot_hid) + ((devinfo-Valid ACPI_VALID_STA) == 0 || + (devinfo-CurrentStatus ACPI_STA_OK) == ACPI_STA_OK)) { rv = mpacpi_get_bbn(acpi, parent, bus); if (ACPI_FAILURE(rv)) bus = 0;
Build and include hdaudio hdafg as modules
Now that Jared has modularized the hdaudio and hdafg drivers, is there any reason that we should not be building the modules by default? And is there any reason why they should not be removed from i386/GENERIC and amd64/GENERIC (and, of course, added to i386/MONOLITHIC)? Patch attached. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: distrib/sets/lists/modules/mi === RCS file: /cvsroot/src/distrib/sets/lists/modules/mi,v retrieving revision 1.21 diff -u -p -r1.21 mi --- distrib/sets/lists/modules/mi 26 Feb 2011 18:07:15 - 1.21 +++ distrib/sets/lists/modules/mi 28 May 2011 13:18:18 - @@ -52,6 +52,10 @@ ./@MODULEDIR@/flash/flash.kmod base-kernel-modules kmod ./@MODULEDIR@/fss base-kernel-modules kmod ./@MODULEDIR@/fss/fss.kmod base-kernel-modules kmod +./@MODULEDIR@/hdafgbase-kernel-modules kmod +./@MODULEDIR@/hdafg/hdafg.kmod base-kernel-modules kmod +./@MODULEDIR@/hdaudio base-kernel-modules kmod +./@MODULEDIR@/hdaudio/hdaudio.kmod base-kernel-modules kmod ./@MODULEDIR@/hfs base-kernel-modules kmod ./@MODULEDIR@/hfs/hfs.kmod base-kernel-modules kmod ./@MODULEDIR@/kernfs base-kernel-modules kmod Index: sys/arch/amd64/conf/GENERIC === RCS file: /cvsroot/src/sys/arch/amd64/conf/GENERIC,v retrieving revision 1.318 diff -u -p -r1.318 GENERIC --- sys/arch/amd64/conf/GENERIC 1 Apr 2011 12:11:17 - 1.318 +++ sys/arch/amd64/conf/GENERIC 28 May 2011 13:18:30 - @@ -992,8 +992,8 @@ opl*at fms? opl* at sv? # High Definition Audio -hdaudio* at pci? dev ? function ?# High Definition Audio -hdafg* at hdaudiobus? +#hdaudio* at pci? dev ? function ?# High Definition Audio +#hdafg*at hdaudiobus? # Audio support audio* at audiobus? Index: sys/arch/i386/conf/GENERIC === RCS file: /cvsroot/src/sys/arch/i386/conf/GENERIC,v retrieving revision 1.1032 diff -u -p -r1.1032 GENERIC --- sys/arch/i386/conf/GENERIC 28 May 2011 13:01:49 - 1.1032 +++ sys/arch/i386/conf/GENERIC 28 May 2011 13:18:38 - @@ -1369,8 +1369,8 @@ opl* at yds? opl* at ym? # High Definition Audio -hdaudio* at pci? dev ? function ?# High Definition Audio -hdafg* at hdaudiobus? +#hdaudio* at pci? dev ? function ?# High Definition Audio +#hdafg*at hdaudiobus? # Audio support audio* at audiobus? Index: sys/arch/i386/conf/MONOLITHIC === RCS file: /cvsroot/src/sys/arch/i386/conf/MONOLITHIC,v retrieving revision 1.15 diff -u -p -r1.15 MONOLITHIC --- sys/arch/i386/conf/MONOLITHIC 6 Mar 2011 17:08:26 - 1.15 +++ sys/arch/i386/conf/MONOLITHIC 28 May 2011 13:18:45 - @@ -66,3 +66,7 @@ pseudo-device putter # for puffs and pu pseudo-device pad # pseudo audio device driver pseudo-device dm # device-mapper device driver + +# High Definition Audio +hdaudio* at pci? dev ? function ?# High Definition Audio +hdafg* at hdaudiobus? Index: sys/modules/Makefile === RCS file: /cvsroot/src/sys/modules/Makefile,v retrieving revision 1.70 diff -u -p -r1.70 Makefile --- sys/modules/Makefile14 Apr 2011 15:45:27 - 1.70 +++ sys/modules/Makefile28 May 2011 13:18:50 - @@ -23,6 +23,8 @@ SUBDIR+= ffs SUBDIR+= filecore SUBDIR+= flash SUBDIR+= fss +SUBDIR+= hdafg +SUBDIR+= hdaudio SUBDIR+= hfs SUBDIR+= kernfs SUBDIR+= ksem
Strange behaviour
The system: dual-12-core AMD Opteron-6172 (total 24 cores at 3.2GHz), and 32GB RAM. (The motherboard is a ASUS KGPE-D16). While running a 'build.sh -j48 release' I notice that there seems to be a rather excessive amount of System Time. Running systat(1) I can see long periods of 60% to 70% (or even more), while at no time do I ever get more than 15% to 20% of user time. The only thing that seems to be running (according to vmstat) at many thousands of Global TLB IPI interrupts, although Interrupt time is mostly invisible. (It never shows up more than 1% or 2% on systat, and a xosview never shows any interrupt time at all.) Other than the 5K to 10K (with occassional bursts of 18K) TLB IPI, the only other interrupt activity is a few hundred disk interrupts, and maybe a few dozen network interrupts. Is this expected behavior? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Strange behaviour
BTW, it was definitely a typo in the original - my system is 24 cores at 2.1GHz, not 3.2GHz. On Sun, 29 May 2011, Thor Lancelot Simon wrote: On Sun, May 29, 2011 at 06:24:06PM -0700, Paul Goyette wrote: The system: dual-12-core AMD Opteron-6172 (total 24 cores at 3.2GHz), and 32GB RAM. (The motherboard is a ASUS KGPE-D16). While running a 'build.sh -j48 release' I notice that there seems to be a rather excessive amount of System Time. Running systat(1) I [...] Is this expected behavior? Yes. We do a little better with 16 build jobs than 12 and considerably worse with 20 than 16. On my 24-core 2Ghz Opteron, the only thing I've tried that really can do an effective job building NetBSD with more than 12 CPUs in use is Linux. I should probably try Solaris. Despite the level of System time, I was able to run a complete build.sh -j48 release in less than 45 minutes. And afterwards, I was able to build a complete new kernel in 44 seconds. (I guess most all of the source directory was cached somewhere in that 32GB of RAM!) Linux under Xen does even worse than NetBSD, probably because Xen has no way to express to Linux the system's memory architecture. Xen has NUMA support in the sense that if you have, say, 6 cores assigned to a virtual and those 6 can be mapped onto physical cores on the same NUMA node, thus providing the virtual a uniform memory hierarchy, it will do so; but if you have a 24-core machine built from two pairs of dual-6-core chips, like the machines you and I have, it really doesn't do a good job telling the guest what it needs to schedule effectively. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: mount -o extattr (extended attributes)
It looks to me like we might also need a rather extensive set of atf regression tests to make sure that future maintenance doesn't break anything... :) On Thu, 16 Jun 2011, Thor Lancelot Simon wrote: On Thu, Jun 16, 2011 at 03:47:27PM +, Emmanuel Dreyfus wrote: Hello Here is a patch that adds mount -o extattr for auto-enabling UFS1 extended attributes. http://ftp.espci.fr/shadow/manu/mount_extattr.patch Any comment on it before I commit? The next step is to add UFS_EXTATTR to GENERIC, XEN3_DOM0 and XEN3_DOMU kernels (and others? which ones?). If there a smart way, or should I do it for all ports? The more I think about this code -- not yours, but the original implementation -- the less comfortable I am with it. It seems to be something of a conceptual mess: it keeps metadata in plain files and treats the metadata writes like file data writes, going around our entire mechanism for preserving metadata integrity. As you've observed, this poses an increased risk of filesystem corruption, and this code is not sufficiently robust to handle such corruption on system startup without crashing the kernel. And there are locking issues with magically creating file Y when an operation is done on file X, some of which you've found/fixed, but I bet there are more. Given the way this code works, creating a shadow tree of files elsewhere in the filesystem namespace, it seems like it is almost the canonical example of something that should be a layered filesystem rather than internally implemented in FFS. Then it would be possible, for example, to mount the upper layer sync so that attribute writes became synchronous and there were much less risk of corruption on crash. Since extended attributes are often used to store data that are used for authentication purposes, it seems to me there will potentially be a dramatic change in system behavior when they are turned on as the system goes to multiuser mode -- far from ideal though I suppose acceptable if this is very well documented and the usual security level restrictions are applied (can't remount the filesystem without -o extattr if security level 0, etc). I think this code is a very good example of why disabled-by-default options are a really bad idea in our kernel. If this code had not spent a decade bit-rotting, I believe its serious problems would have been much more obvious and by now it would have likely been removed from the tree. I think that -- at least -- fsck needs to learn to put the extended attribute database into a state in which it cannot crash the kernel, before this code should be turned on by default. However, given that the extended attributes are treated like file data, that poses a risk of truncation, which could have serious security consequences. So, I think that before this code is turned on by default, it really should to: 1) Be modified to perform all extattr writes synchronously, arranging to flush the disk cache after each. Alternately it might be possible to use WAPBL to preserve the ordering of the extattr and other metadata writes -- but have we ever used WAPBL for file-data-block writes like this? Note that there are tricky corner cases when attributes hold authorization data. For example, what if an application changes the extended attributes, then writes sensitive data to a file, thinking it is protected? It is necessary to order the extattr write and the subsequent data write properly or this is not safe. 2) Be able to repair damaged attribute databases, stopping and demanding manual intervention (manual fsck?) when it detects corruption that might have security or stability consequences. 3) In an ideal world, be reimplemented as a layer rather than a bunch of code inside FFS. Thor !DSPAM:4dfa379f2622059781524! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Multiple device attachments
On Thu, 21 Jul 2011, Michael wrote: The examples you site seem to indicate that for example the le device may attach to many alternative devices (e.g. pci, tc, ?), but only one attachment is made when autoconf is complete. I may have read the code examples incorrectly -- please pardon me if I did; but what I want to know is -- can a device have multiple attachments (more than one parent device) when autoconf is complete. What do you mean - several instances of the same driver with different bus backends in the same kernel? Sure, there is absolutely nothing to prevent it. Yes, there can be multiple instances of a driver, with each instance attached to a different attribute. However, each single instance can be attached to only one parent. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: scsictl equivalent for SATA
drvctl should work for you - this was added in the last couple of months. On Mon, 1 Aug 2011, Manuel Bouyer wrote: On Mon, Aug 01, 2011 at 12:43:03PM +0200, Edgar Fuß wrote: I've got a hot-pluggable SATA drive in a RAID1 that failed. I've never been into this with SATA, only with SCA. What do I do after physically replacing the drive to make the new one known to the kernel? I do know how to re-build the RAID, but what's the analogous to scsictl detach/scan? There is none at this time (unless something changed recently I didn't notice). But if you plugged the new drive in the same slot as the old one, you should be able to use it without extra steps. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference -- !DSPAM:4e3696682351153699719! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Where are the specific WARNS=n defined?
I'm trying to modularize a couple of drivers, and one of them is generating some gcc errors due to comparison of signed and unsigned values. The driver module is currently being compiled with WARNS=4 (just picked that up from another Makefile). Is there a more appropriate WARNS=n to use to permit the comparison? Or should I simply add CCOPTS+= -Wno-sign-compare (which is used when the driver is being compiled as part of a kernel)? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Where are the specific WARNS=n defined?
On Tue, 23 Aug 2011, Christos Zoulas wrote: I'm trying to modularize a couple of drivers, and one of them is generating some gcc errors due to comparison of signed and unsigned values. The driver module is currently being compiled with WARNS=4 (just picked that up from another Makefile). Is there a more appropriate WARNS=n to use to permit the comparison? Or should I simply add CCOPTS+= -Wno-sign-compare (which is used when the driver is being compiled as part of a kernel)? It is best to fix the errors. Yeah, OK! :) Fixing, will commit shortly. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
wapbl module
It would appear that wapbl is only relevant for ffs file systems (and in particular, only for ffs filesystems with a V2 superblock format). Yet the current modularization of wapbl is not dependant on the ffs module. (wapbl's required-list is empty.) I realize that even though wapbl registers itself as a module, it is not built as a loadable module - ie, it must be built-in. I intend to try to change this, so I would like to know if wapbl is ever intended for non-ffs file systems. FWIW, I also intend to have the ffs code auto-load wapbl whenever it is needed. Thanks in advance for any comments. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: wapbl module
On Thu, 22 Sep 2011, Joerg Sonnenberger wrote: On Thu, Sep 22, 2011 at 08:24:48AM -0700, Paul Goyette wrote: I realize that even though wapbl registers itself as a module, it is not built as a loadable module - ie, it must be built-in. I intend to try to change this, so I would like to know if wapbl is ever intended for non-ffs file systems. The WAPBL option changes the buffer subsystem, that's why it can't easily be modularized. Ah, OK. Then I guess I'll not dive any further into this! Thanks. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
prop_dictionary_iterator
In sys/dev/sysmon/sysmon_power.c we have ... while ((obj = prop_object_iterator_next(iter)) != NULL) { prop_dictionary_remove(ped-dict, prop_dictionary_keysym_cstring_nocopy(obj)); prop_object_iterator_reset(iter); } ... The man-page for prop_dictionary_iterator says in part ... Storing to or removing from the dictionary invalidates any active iterators for the dictionary. Doesn't the code-snippet above do exactly what the man-page says should not be done? I've got a machine that lately has decided to panic in the middle of the night. I finally got a traceback which casts great suspicion on the above code. (Unfortunately I managed to reset the machine before I was able to copy the whole traceback; I have now attached a PS2 keyboard, anticipating the next occurrence.) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Return status ENXIO / ESRCH in kern_drvctl.c
While browsing the code for other stuff, I ran into what appears to be an inconsistency in drvctl(4). In most cases, when the device specified cannot be found, we return ENXIO - Device not configured. But in routine drvctl_command_get_properties(), if the device is not found, we return ESRCH - No such process. (This is in file sys/kern/kern_drvctl.c rev 1.32 at line 462) It seems to me that this is incorrect, and we should return ENXIO here. As a side note, it would be nice if we had a drvctl(4) man page. :) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
rw_lock vs mutex
While digging around looking into another problem, I noticed that the piixpm(4) driver uses an rw_lock for its ic_acquire_bus/ic_release_bus routines. ic_acquire_bus() uses rw_enter(..., RW_WRITER) and there doesn't appear to be any use anywhere of RW_READER for that lock. The man page for rw_lock implies that it is a superset of a mutex. So I'm wondering if it makes any sense to use the simpler mutex instead? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
RE: rw_lock vs mutex
On Thu, 2 Feb 2012, paul_kon...@dell.com wrote: A rw_lock allows multiple readers, correct? If there's a non-trivial probability of concurrent reads that would make a difference. If not, then a mutex would be just as good especially if that is lower overhead. The rwlock in question is contained with the driver's softc. The only exported accessors are i2c_bus_acquire() (which grabs a RW_WRITER lock) and i2c_bus_release(). A mutex makes much more sense. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Enable KMEM_GUARD without all the rest of DEBUG ?
On another thread discussing random kernel crashes (likely due to memory corruption) [1], it was suggested that KMEM_GUARD might be a useful tool for debugging... By adding KMEM_GUARD itself to a kernel config file, I discovered that KMEM_GUARD cannot be enabled individually - it is currently required as part of a set of tools that are collectively enabled by 'options DEBUG' /build/netbsd-test/src/sys/kern/subr_kmem.c: In function 'kmem_init': /build/netbsd-test/src/sys/kern/subr_kmem.c:344:20: error: 'kmem_guard' undeclared (first use in this function) /build/netbsd-test/src/sys/kern/subr_kmem.c:344:33: error: 'kmem_guard_depth' undeclared (first use in this function) /build/netbsd-test/src/sys/kern/subr_kmem.c:344:52: error: 'kmem_guard_size' undeclared (first use in this function) The following patches allow building of a kernel with only KMEM_GUARD. The first patch makes the variable allocations depend on the KMEM_GUARD whether or not DEBUG was specified, and the second patch includes the necessary uvm_kmguard.c when either option is specified. Index: subr_kmem.c === RCS file: /cvsroot/src/sys/kern/subr_kmem.c,v retrieving revision 1.42 diff -u -p -r1.42 subr_kmem.c --- sys/kern/subr_kmem.c 5 Feb 2012 03:40:08 - 1.42 +++ sys/kern/subr_kmem.c 8 Mar 2012 01:53:32 - @@ -122,9 +122,6 @@ static pool_cache_t kmem_cache[KMEM_CACH static size_t kmem_cache_maxidx __read_mostly; #if defined(DEBUG) -int kmem_guard_depth = 0; -size_t kmem_guard_size; -static struct uvm_kmguard kmem_guard; static void *kmem_freecheck; #defineKMEM_POISON #defineKMEM_REDZONE @@ -132,6 +129,12 @@ static void *kmem_freecheck; #defineKMEM_GUARD #endif /* defined(DEBUG) */ +#if defined (KMEM_GUARD) +int kmem_guard_depth = 0; +size_t kmem_guard_size; +static struct uvm_kmguard kmem_guard; +#endif + #if defined(KMEM_POISON) static int kmem_poison_ctor(void *, void *, int); static void kmem_poison_fill(void *, size_t); Index: uvm/files.uvm === RCS file: /cvsroot/src/sys/uvm/files.uvm,v retrieving revision 1.20 diff -u -p -r1.20 files.uvm --- sys/uvm/files.uvm 17 May 2011 05:32:31 - 1.20 +++ sys/uvm/files.uvm 8 Mar 2012 02:18:55 - @@ -25,7 +25,7 @@ file uvm/uvm_glue.c file uvm/uvm_init.c file uvm/uvm_io.c file uvm/uvm_km.c -file uvm/uvm_kmguard.c debug +file uvm/uvm_kmguard.c debug | kmem_guard file uvm/uvm_loan.c file uvm/uvm_map.c file uvm/uvm_meter.c Does anyone have any objections to committing these fixes? [1] http://mail-index.netbsd.org/current-users/2012/03/07/msg019419.html - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Adding SMBus block transfers to iic(4)
There are apparently some devices out there that really need to have i2c block-mode transfers, and it appears that none of our current drivers (at least, none of the mainstream pci drivers) supports this. At first glance, it might appear that we could simply check the transfer requested byte-count, and automatically operate in block mode if the count is greater than 2; the implication is that we could fall back to byte or word transfers for smaller byte counts. But it looks like at least one device out there [1] doesn't support this. So I propose adding a new flag bit that can be passed to iic_exec() to request block-transfer protocols. In src/sys/dev/i2c/i2c_io.h #define I2C_F_BLOCK 0x20 Comments? Suggestions? Alternatives? [1] http://eeepc-linux.googlecode.com/files/9LPR426A.pdf - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Adding SMBus block transfers to iic(4)
Good point. On Sun, 11 Mar 2012, Mouse wrote: [...] i2c block-mode transfers [...] #define I2C_F_BLOCK 0x20 Comments? Suggestions? Alternatives? BLOCK has a second, quite different, meaning (as in, blocking I/O). It may not apply here, but defining a bit in the interface that can be misunderstood as indicating it does could, at the very least, be confusing. Might I suggest BLKMODE instead of BLOCK? At least to my eye, that's a lot less ambiguous. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B !DSPAM:4f5d315b1981260811860! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Adding SMBus block transfers to iic(4)
Just an update here. In order to enable requesting block-mode transfers using the I2C_IOCTL_EXEC method from userland, I'm now proposing to add two new high-level operations to the i2c_op_t enum, along with a new macro predicate: typedef enum { I2C_OP_READ_WITH_STOP = 1, I2C_OP_WRITE= 2, I2C_OP_WRITE_WITH_STOP = 3, + I2C_OP_READ_BLOCK = 5,/* All block operations imply STOP */ + I2C_OP_WRITE_BLOCK = 7 } i2c_op_t; #defineI2C_OP_READ_P(x)(((int)(x) 2) == 0) #defineI2C_OP_WRITE_P(x) (! I2C_OP_READ_P(x)) #defineI2C_OP_STOP_P(x)(((int)(x) 1) != 0) +#defineI2C_OP_BLKMODE_P(x) (((int)(x) 4) != 0) On Sun, 11 Mar 2012, Paul Goyette wrote: There are apparently some devices out there that really need to have i2c block-mode transfers, and it appears that none of our current drivers (at least, none of the mainstream pci drivers) supports this. At first glance, it might appear that we could simply check the transfer requested byte-count, and automatically operate in block mode if the count is greater than 2; the implication is that we could fall back to byte or word transfers for smaller byte counts. But it looks like at least one device out there [1] doesn't support this. So I propose adding a new flag bit that can be passed to iic_exec() to request block-transfer protocols. In src/sys/dev/i2c/i2c_io.h #define I2C_F_BLOCK 0x20 Comments? Suggestions? Alternatives? [1] http://eeepc-linux.googlecode.com/files/9LPR426A.pdf - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | - !DSPAM:4f5d2d7a1985496917264! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: CVS commit: src
On Mar 14, 2012, at 10:55 AM, Martin Husemann wrote: On Tue, Mar 13, 2012 at 06:41:18PM +, Elad Efrat wrote: Log Message: Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with something meaningful. All relevant documentation has been updated or written. Most of these changes were brought up in the following messages: http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html Thanks to christos, manu, njoly, and jmmv for input. Huge thanks to pgoyette for spinning these changes through some build cycles and ATF. This seems to cause deadlocks in the *fs_rename_dir tests. See recent (new) test failures both on i386 and amd64 in fs/vfs/t_vnops. Hmmm, they didn't fail when I tested Elad's changes. Either that, or I missed something in the test output. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: CVS commit: src/tests/modules
looks good to me... On Wed, 21 Mar 2012, Martin Husemann wrote: On Wed, Mar 21, 2012 at 12:25:25PM +, Martin Husemann wrote: This (untested) would add one (sysctl kern.module.modular would say 0 for non-modular kernels, or 1 otherwise). I had to fix other parts of the kernel, but the patch now works as expected. Any objections? Martin !DSPAM:4f69f0a61987340033397! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
USB 3.0 ?
Just curious - is anyone out there working on a USB 3.0 driver for NetBSD? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: introduce device_is_attached()
Along these same lines, it might be nice if there were a way to determine if _any_ instances of a device were attached. This is particularly useful for device-driver modules, which should prevent themselves from being unloaded if there are active instances. A quick check of some device-driver modules show that they simply call config_fini_component() which updates autoconfig tables, but doesn't seem to check if a driver instance exists... On Mon, 16 Apr 2012, Christoph Egger wrote: Hi, I want to introduce a new function to sys/devices.h: bool device_is_attached(device_t parent, cfdata_t cf); The purpose is for bus drivers who wants to attach children and ensure that only one instance of it will attach. 'parent' is the bus driver and 'cf' is the child device as passed to the submatch callback via config_search_loc(). The return value is true if the child is already attached. I implemented a reference usage of it in amdnb_misc.c to ensure that amdtemp only attaches once on rescan. Any comments to the patch? Christoph !DSPAM:4f8c4e7e1986566716057! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Module name - recommendations/opinions sought
I'm in the process of modularizing the ieee80211 (Wireless LAN) code, and would like some feedback on what the module's name should be. I can think of at least three or four likely candidates: net80211 ieee80211 wlan wireless-lan I'm leaning towards the simplest one - wlan - but would like to know what others think before I commit. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
link_set info needed
I've been working on modularizing the ieee80211 code, and I've discovered that the code using a link_set to collect various one-time initialization routines. There is a RUN_ONCE() invocation that uses __link_set_for_each() to iterate over these init routines. This all works just fine when including net80211 in a monolithic kernel, but for some reason, when building as a module, the link_set sections don't seem to get created (they're not listed by objdump -h). __link_set_for_each tries to loop through the items in {start,stop}_link_set_set-name - these items are contributed by other source files. But the symbols that should identify the start and stop of the link set are undefined. Any suggestions on an appropriate alternate to link sets to collect the contributions? If anyone wants to look at the actual code, it is in src/sys/net80211/ieee80211_netbsd.[ch] Thanks! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Module name - recommendations/opinions sought
Yes, wlan was my first choice, cimply because that is the name of the config attribute. But it seems a consensus is emerging that the module name should be net80211 so that's what I'm going to do (assuming I figure out the link_set issue). I'll leave the rototilling of the config files for another day. On Thu, 26 Apr 2012, Masao Uebayashi wrote: I thought module names alway match what's define'ed in config(1) files... ... and now I've found it's wlan, which should be changed to a saner name like net80211. On Thu, Apr 26, 2012 at 9:52 AM, Paul Goyette p...@whooppee.com wrote: I'm in the process of modularizing the ieee80211 (Wireless LAN) code, and would like some feedback on what the module's name should be. I can think of at least three or four likely candidates: net80211 ieee80211 wlan wireless-lan I'm leaning towards the simplest one - wlan - but would like to know what others think before I commit. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | - !DSPAM:4f98c4322402014054294! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Modularizing net80211 (was: link_set info needed)
Folks, I would like to commit the attached patches to modularize the net80211 code. The changes include: 1. Protecting the inclusion of opt_inet.h with #ifdef _KERNEL 2. Removing CTLFLAG_PERMANENT from all of the sysctl(8) variables 3. Replacing the use of link_set with direct calls to all of the crypto/auth initializers. At a future time, I propose to split the crypto/auth stuff out into their own modules, but that will require a bit more planning and a lot more testing. :) I have tested the attached changes on my home system, using a modular if_rum(4) driver (which I plan to commit separately). Comments/feedback welcomed... - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: ieee80211.c === RCS file: /cvsroot/src/sys/net80211/ieee80211.c,v retrieving revision 1.53 diff -u -p -r1.53 ieee80211.c --- ieee80211.c 5 Apr 2010 07:22:24 - 1.53 +++ ieee80211.c 27 Apr 2012 12:34:49 - @@ -43,7 +43,9 @@ __KERNEL_RCSID(0, $NetBSD: ieee80211.c, * IEEE 802.11 generic handler */ +#if defined(_KERNEL_OPT) #include opt_inet.h +#endif #include sys/param.h #include sys/systm.h Index: ieee80211_crypto.c === RCS file: /cvsroot/src/sys/net80211/ieee80211_crypto.c,v retrieving revision 1.15 diff -u -p -r1.15 ieee80211_crypto.c --- ieee80211_crypto.c 23 May 2011 15:37:36 - 1.15 +++ ieee80211_crypto.c 27 Apr 2012 12:34:49 - @@ -39,7 +39,9 @@ __FBSDID($FreeBSD: src/sys/net80211/iee __KERNEL_RCSID(0, $NetBSD: ieee80211_crypto.c,v 1.15 2011/05/23 15:37:36 drochner Exp $); #endif +#if defined(_KERNEL_OPT) #include opt_inet.h +#endif /* * IEEE 802.11 generic crypto support. Index: ieee80211_input.c === RCS file: /cvsroot/src/sys/net80211/ieee80211_input.c,v retrieving revision 1.72 diff -u -p -r1.72 ieee80211_input.c --- ieee80211_input.c 31 Dec 2011 20:41:58 - 1.72 +++ ieee80211_input.c 27 Apr 2012 12:34:49 - @@ -39,7 +39,9 @@ __FBSDID($FreeBSD: src/sys/net80211/iee __KERNEL_RCSID(0, $NetBSD: ieee80211_input.c,v 1.72 2011/12/31 20:41:58 christos Exp $); #endif +#if defined(_KERNEL_OPT) #include opt_inet.h +#endif #ifdef __NetBSD__ #endif /* __NetBSD__ */ Index: ieee80211_ioctl.c === RCS file: /cvsroot/src/sys/net80211/ieee80211_ioctl.c,v retrieving revision 1.57 diff -u -p -r1.57 ieee80211_ioctl.c --- ieee80211_ioctl.c 31 Dec 2011 20:41:58 - 1.57 +++ ieee80211_ioctl.c 27 Apr 2012 12:34:50 - @@ -43,8 +43,10 @@ __KERNEL_RCSID(0, $NetBSD: ieee80211_io * IEEE 802.11 ioctl support (FreeBSD-specific) */ +#if defined(_KERNEL_OPT) #include opt_inet.h #include opt_compat_netbsd.h +#endif #include sys/endian.h #include sys/param.h Index: ieee80211_netbsd.c === RCS file: /cvsroot/src/sys/net80211/ieee80211_netbsd.c,v retrieving revision 1.20 diff -u -p -r1.20 ieee80211_netbsd.c --- ieee80211_netbsd.c 19 Nov 2011 22:51:25 - 1.20 +++ ieee80211_netbsd.c 27 Apr 2012 12:34:50 - @@ -40,6 +40,7 @@ __KERNEL_RCSID(0, $NetBSD: ieee80211_ne #include sys/kernel.h #include sys/systm.h #include sys/mbuf.h +#include sys/module.h #include sys/proc.h #include sys/sysctl.h #include sys/once.h @@ -72,6 +73,8 @@ static int ieee80211_sysctl_node(SYSCTLF intieee80211_debug = 0; #endif +#ifdef NOTYET /* Currently we can't use link sets in modules */ + typedef void (*ieee80211_setup_func)(void); __link_set_decl(ieee80211_funcs, ieee80211_setup_func); @@ -81,7 +84,7 @@ ieee80211_init0(void) { ieee80211_setup_func * const *ieee80211_setup, f; -__link_set_foreach(ieee80211_setup, ieee80211_funcs) { + __link_set_foreach(ieee80211_setup, ieee80211_funcs) { f = (void*)*ieee80211_setup; (*f)(); } @@ -96,6 +99,29 @@ ieee80211_init(void) RUN_ONCE(ieee80211_init_once, ieee80211_init0); } +#else +/* + * One-time initialization + * + * XXX Make sure you update the list of CYPTO initializers if new + * XXX mechanisms are added! + */ + +void ccmp_register(void); +void tkip_register(void); +void wep_register(void); +void ieee80211_external_auth_setup(void); + +void +ieee80211_init(void) +{ + + ccmp_register(); + tkip_register(); + wep_register(); + ieee80211_external_auth_setup(); +} +#endif /* NOT_YET */ static
Re: Modularizing net80211 (was: link_set info needed)
On Fri, 27 Apr 2012, Joerg Sonnenberger wrote: On Fri, Apr 27, 2012 at 05:51:30AM -0700, Paul Goyette wrote: 3. Replacing the use of link_set with direct calls to all of the crypto/auth initializers. I believe this to a noticable regression due to bugs in the module framework. It should support either link sets or _init sections. If the latter is supposed to be the way forward, current linkset usage should be adjusted and an appropiate place in the kernel for calling all static initializers be found. Thanks for the feedback. I am not familiar with the _init sections stuff - can you provide a reference or example? Please also note that separating the crypto/auth stuff into their own modules later on will remove these direct calls to the initializers. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -