Re: Summary: Re: Spin down HDD after disk sync or before power off
Garrett Cooper gcoo...@freebsd.org writes: Agreed. Spinning down at reboot isn't smart and seems like a good way to kill a disk quicker. *not* spinning down at halt is far worse. Most modern disks are rated for hundreds of thousands of load-unload cycles, but far fewer emergency unloads (which is what happens when the drive loses power while still spinning). DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: is vfs.lookup_shared unsafe in 7.3?
Hello, Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA 40, very slow lstat() at these moments, looks like some kind of lock contention) I enabled vfs.lookup_shared=1 on two servers today. One is FreeBSD-7.3 kernel csup'ed and built Sep 9 2010 and other is FreeBSD-7.3 csup'ed and built Jul 16 2010. The server with more fresh kernel is running nice and does not show high load anymore. But on the second server it did not help. More, after a few hours of work with vfs.lookup_shared=1 I noticed processes stucked in ufs state. I tried to kill them with no luck. Disabling vfs.lookup_shared freezed the whole system. So, is vfs.lookup_shared=1 unsafe in 7.3? Did it become more stable between 16 Jul and 9 Sep (is it the reason why first system is still running?), or should I expect that it will freeze in a near time too? Thanks in advance! No, 7.3 has a bug that can cause these hangs that is probably made worse by vfs.lookup_shared=1, but can occur even if it is disabled. You want these fixes applied (in order, one of them reverts part of another): Thank you for the fix and for the explanation, that's exactly what I wanted to know. Just to be sure: do these patches completely fix the bug with hangs (even without vfs.lookup_shared=1)? -- // cronfy ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Summary: Re: Spin down HDD after disk sync or before power off
Alexander Best wrote: On Wed Sep 15 10, Oliver Fromme wrote: Warren Block wbl...@wonkity.com wrote: [...] 8. Alexander Motin has an updated CAM version of the ATA system which will eventually replace the existing one. In -CURRENT, anyway. He was kind enough to look at my event handler. My understanding is that he is looking at implementing the head parking/standby mechanism in that new code. The patch below will work with the new CAM ATA driver (i.e. ada(4) disks). It adds a sysctl, so you can switch the spin-down off if you're going to just reboot: # sysctl kern.cam.ada.spindown_shutdown=0 i haven't tested your patch yet, but i don't think deciding whether to spin down the hdd should be decided merely from the sysctl value. It was the most simple and least intrusive way to introduce some means to switch it on and off. Of course there might be better ways to do it. You're welcome to submit your own patch. the hdd should spindown when a shutdown has been issued and not spindown, if a reboot has been issued. Right. That's why my shutdown wrapper script sets the sysctl to 0 when the -r option is present (I've got that wrapper script for ages, for different reasons). Also, there are cases where it is completely impossible to decide automatically whether the disks should be spun down or not. For example, if the admin issues a shutdown -h (halt), there's no way for the OS to know in advance whether the admin is going to switch the machine off or reboot to multi-user. So there must be a way for the user to forcibly enable/disable the spindown feature. I think a sysctl is the most appropriate way to do that, isn't it? Actually, my plan is to have a mask of two bits for the sysctl (the default value would be 3): - bit 0: enable (1) or disable (0) spindown - bit 1: automatic (1) or manual (0) setting With the default setting (i.e. bit 1 == 1), at shutdown time some facility would look at the reboot(2) howto flags and then set bit 0 to either 0 or 1. There are several ways where to handle that. For example, init(8) could be modified to pass the howto value to rc.shutdown (which could be useful for other purposes, too). Then a standard rc.d script could handle the spindown sysctl. The advantage of that solution would be maximum flexibility, because the actual logic is implemented in an rc.d script. deciding whether freebsd reboots or shuts down cannot be done from a script, since users might use the reboot or halt commands in which case (if i'm not mistaken) all shutdown scripts get skipped. Right, which is why it is a rather bad idea to use halt(8) or reboot(8), except in an emergency. Actually I think the manpages and handbook should strongly discourage it, and recommend to use shutdown(8) or init(8) instead, both of which send a signal to PID 1 by default, so rc.shutdown is executed properly. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd If Java had true garbage collection, most programs would delete themselves upon execution. -- Robert Sewell ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: is vfs.lookup_shared unsafe in 7.3?
On Thursday, September 16, 2010 3:53:47 am cronfy wrote: Hello, Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA 40, very slow lstat() at these moments, looks like some kind of lock contention) I enabled vfs.lookup_shared=1 on two servers today. One is FreeBSD-7.3 kernel csup'ed and built Sep 9 2010 and other is FreeBSD-7.3 csup'ed and built Jul 16 2010. The server with more fresh kernel is running nice and does not show high load anymore. But on the second server it did not help. More, after a few hours of work with vfs.lookup_shared=1 I noticed processes stucked in ufs state. I tried to kill them with no luck. Disabling vfs.lookup_shared freezed the whole system. So, is vfs.lookup_shared=1 unsafe in 7.3? Did it become more stable between 16 Jul and 9 Sep (is it the reason why first system is still running?), or should I expect that it will freeze in a near time too? Thanks in advance! No, 7.3 has a bug that can cause these hangs that is probably made worse by vfs.lookup_shared=1, but can occur even if it is disabled. You want these fixes applied (in order, one of them reverts part of another): Thank you for the fix and for the explanation, that's exactly what I wanted to know. Just to be sure: do these patches completely fix the bug with hangs (even without vfs.lookup_shared=1)? Yes. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: odd issues with DDB vs GDB
On Wednesday, September 15, 2010 8:01:19 pm Patrick Mahan wrote: All, I am trying to debug a system hang occurring on my HP Proliant G6 running some of our kernel software. I am seeing that under certain test loads, the system will hang-up complete, no keyboard, no console, etc. I suspect it is some of the kernel code that I have inherited that contains a lot of locking (lots of data structure, each having their own mutex lock (sleepable)). You need to use 'kgdb' rather than 'gdb' on kernel.debug. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: traling whitespace in CFLAGS if make.conf:CPUTYPE is not defined/empty
On Wednesday, September 15, 2010 9:01:20 pm Alexander Best wrote: hi there, after discovering PR #114082 i noticed that with CPUTYPE not being defined in make.conf, `make -VCFLAGS` reports a trailing whitespace for CFLAGS. the reason for this is that ${_CPUCFLAGS} gets added to CFLAGS even if it's empty. the following patch should take care of the problem. i also added the same logik to COPTFLAGS. although i wasn't able to trigger the trailing whitespace, it should still introduce a cleaner behaviour. Does the trailing whitespace break anything? In the past we have had a non-empty default CPU CFLAGS (e.g. using '-mtune=pentiumpro' on i386 at one point IIRC) which this change would break. Unless the trailing whitespace is causing non-cosmetic problems I'd probably just leave it as it is. Also, if we were to go with this approach, I would not have changed kern.pre.mk at all, but set both NO_CPU_CFLAGS and NO_CPU_COPTFLAGS in bsd.cpu.mk when CPUTYPE was empty. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: traling whitespace in CFLAGS if make.conf:CPUTYPE is not defined/empty
On Thu Sep 16 10, John Baldwin wrote: On Wednesday, September 15, 2010 9:01:20 pm Alexander Best wrote: hi there, after discovering PR #114082 i noticed that with CPUTYPE not being defined in make.conf, `make -VCFLAGS` reports a trailing whitespace for CFLAGS. the reason for this is that ${_CPUCFLAGS} gets added to CFLAGS even if it's empty. the following patch should take care of the problem. i also added the same logik to COPTFLAGS. although i wasn't able to trigger the trailing whitespace, it should still introduce a cleaner behaviour. Does the trailing whitespace break anything? In the past we have had a non-empty default CPU CFLAGS (e.g. using '-mtune=pentiumpro' on i386 at one point IIRC) which this change would break. Unless the trailing whitespace is causing non-cosmetic problems I'd probably just leave it as it is. the PR claims that a few ports are having problems with trailing whitespaces during ./configure, but personally i haven't experienced any problems. however i don't use the port system a lot so i'm not really able to comment on that. cheers. alex Also, if we were to go with this approach, I would not have changed kern.pre.mk at all, but set both NO_CPU_CFLAGS and NO_CPU_COPTFLAGS in bsd.cpu.mk when CPUTYPE was empty. -- John Baldwin -- a13x ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Summary: Re: Spin down HDD after disk sync or before power off
On Thursday 16 September 2010 10:41:07 Oliver Fromme wrote: Alexander Best wrote: On Wed Sep 15 10, Oliver Fromme wrote: The patch below will work with the new CAM ATA driver (i.e. ada(4) disks). It adds a sysctl, so you can switch the spin-down off if you're going to just reboot: # sysctl kern.cam.ada.spindown_shutdown=0 the hdd should spindown when a shutdown has been issued and not spindown, if a reboot has been issued. Right. That's why my shutdown wrapper script sets the sysctl to 0 when the -r option is present (I've got that wrapper script for ages, for different reasons). Also, there are cases where it is completely impossible to decide automatically whether the disks should be spun down or not. For example, if the admin issues a shutdown -h (halt), there's no way for the OS to know in advance whether the admin is going to switch the machine off or reboot to multi-user. So there must be a way for the user to forcibly enable/disable the spindown feature. I think a sysctl is the most appropriate way to do that, isn't it? I would just spin down the disk in case of a halt. An unwanted spin down is harmless compared to an emergency shutdown and usually the intention is to power off rather than reboot. Part of your patch modifies ada_shutdown. That function already gets the reboot(2) howto flags passed to it, so you could test for (howto (RB_HALT | RB_POWEROFF)) != 0 before issuing the STANDBY command. There's no need to make this more complicated with a sysctl that can override this in my opinion. Also command2 should be command1 in this line: + if (cgd-ident_data.support.command2 ATA_SUPPORT_POWERMGT) signature.asc Description: This is a digitally signed message part.
Re: Summary: Re: Spin down HDD after disk sync or before power off
Tijl Coosemans wrote: On Thursday 16 September 2010 10:41:07 Oliver Fromme wrote: Also, there are cases where it is completely impossible to decide automatically whether the disks should be spun down or not. For example, if the admin issues a shutdown -h (halt), there's no way for the OS to know in advance whether the admin is going to switch the machine off or reboot to multi-user. So there must be a way for the user to forcibly enable/disable the spindown feature. I think a sysctl is the most appropriate way to do that, isn't it? I would just spin down the disk in case of a halt. An unwanted spin down is harmless compared to an emergency shutdown and usually the intention is to power off rather than reboot. Is it? When I intend to power-off, I use shutdown -p, not shutdown -h. Quite often (but not always) when I halt a machine, I'm going to reboot to multi-user, not power off. In that case I certainly wouldn't want to spin the drives down and have them spun up immediately afterwards. I don't think that weartear caused by that procedure is completely insignificant (although it's certainly less of a problem than emergency unloads). For that reason I definitely want to have a way to disable the spindown function manually. Part of your patch modifies ada_shutdown. That function already gets the reboot(2) howto flags passed to it, so you could test for (howto (RB_HALT | RB_POWEROFF)) != 0 before issuing the STANDBY command. Right, good point. I didn't notice because the shutdown function in ad(4) doesn't get the howto flag, so I assumed (without checking) that ada(4) doesn't get it either. There's no need to make this more complicated with a sysctl that can override this in my opinion. I'm afraid I have to disagree (see above). Apart from that, there's nothing complicated at all about a sysctl. Also command2 should be command1 in this line: + if (cgd-ident_data.support.command2 ATA_SUPPORT_POWERMGT) Oops ... You're right. Thanks for pointing that out. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd I made up the term 'object-oriented', and I can tell you I didn't have C++ in mind. -- Alan Kay, OOPSLA '97 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Summary: Re: Spin down HDD after disk sync or before power off
On Thu, 16 Sep 2010, Alexander Best wrote: On Wed Sep 15 10, Oliver Fromme wrote: Warren Block wbl...@wonkity.com wrote: [...] 8. Alexander Motin has an updated CAM version of the ATA system which will eventually replace the existing one. In -CURRENT, anyway. He was kind enough to look at my event handler. My understanding is that he is looking at implementing the head parking/standby mechanism in that new code. The patch below will work with the new CAM ATA driver (i.e. ada(4) disks). It adds a sysctl, so you can switch the spin-down off if you're going to just reboot: # sysctl kern.cam.ada.spindown_shutdown=0 i haven't tested your patch yet, but i don't think deciding whether to spin down the hdd should be decided merely from the sysctl value. the hdd should spindown when a shutdown has been issued and not spindown, if a reboot has been issued. It's been a while, but the problem I found when comparing the NetBSD code was that there didn't appear to be a way to tell from within the FreeBSD driver whether it was a shutdown or reboot. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Summary: Re: Spin down HDD after disk sync or before power off
On 09/16/10 09:42, Warren Block wrote: On Thu, 16 Sep 2010, Alexander Best wrote: On Wed Sep 15 10, Oliver Fromme wrote: Warren Block wbl...@wonkity.com wrote: [...] 8. Alexander Motin has an updated CAM version of the ATA system which will eventually replace the existing one. In -CURRENT, anyway. He was kind enough to look at my event handler. My understanding is that he is looking at implementing the head parking/standby mechanism in that new code. The patch below will work with the new CAM ATA driver (i.e. ada(4) disks). It adds a sysctl, so you can switch the spin-down off if you're going to just reboot: # sysctl kern.cam.ada.spindown_shutdown=0 i haven't tested your patch yet, but i don't think deciding whether to spin down the hdd should be decided merely from the sysctl value. the hdd should spindown when a shutdown has been issued and not spindown, if a reboot has been issued. It's been a while, but the problem I found when comparing the NetBSD code was that there didn't appear to be a way to tell from within the FreeBSD driver whether it was a shutdown or reboot. Register a shutdown event handler? The second argument can be tested against RB_HALT to determine what is happening. -Nathan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Summary: Re: Spin down HDD after disk sync or before power off
On Thursday 16 September 2010 16:10:22 Oliver Fromme wrote: Tijl Coosemans wrote: I would just spin down the disk in case of a halt. An unwanted spin down is harmless compared to an emergency shutdown and usually the intention is to power off rather than reboot. Is it? When I intend to power-off, I use shutdown -p, not shutdown -h. Quite often (but not always) when I halt a machine, I'm going to reboot to multi-user, not power off. Hmm, I suppose support for power off is ubiquitous nowadays. It used to be that halt meant: bring the system in a state where we can safely cut the power. In that case it makes sense to let halt spin down the disks. If you intend to reboot why not explicitly reboot rather than halt? Also, to go from single to multi user mode you can just exit(1) the shell. In that case I certainly wouldn't want to spin the drives down and have them spun up immediately afterwards. I don't think that weartear caused by that procedure is completely insignificant (although it's certainly less of a problem than emergency unloads). For that reason I definitely want to have a way to disable the spindown function manually. Ok, I'm soft on the sysctl really, it wouldn't hurt anyone. Although, if the intention is to just override the default behaviour at the time of shutdown you might as well just add an option to halt(8). A don't spin down disks option would fit in with the other options there. signature.asc Description: This is a digitally signed message part.
Re: Summary: Re: Spin down HDD after disk sync or before power off
Tijl Coosemans wrote: On Thursday 16 September 2010 16:10:22 Oliver Fromme wrote: Tijl Coosemans wrote: I would just spin down the disk in case of a halt. An unwanted spin down is harmless compared to an emergency shutdown and usually the intention is to power off rather than reboot. Is it? When I intend to power-off, I use shutdown -p, not shutdown -h. Quite often (but not always) when I halt a machine, I'm going to reboot to multi-user, not power off. Hmm, I suppose support for power off is ubiquitous nowadays. It used to be that halt meant: bring the system in a state where we can safely cut the power. In that case it makes sense to let halt spin down the disks. If you intend to reboot why not explicitly reboot rather than halt? For example, I use shutdown -h in order to swap disks that are not hot-swappable, or other kind of hardware work that can be done while the machine is switched on. Of course, in that particular case the disk which is about to be swapped out should be spun down, while the others should not. But that's not a problem because I can use atacontrol(8) and camcontrol(8) to spin down a specific disk drive manually. Also, to go from single to multi user mode you can just exit(1) the shell. Yes, of course, that's a different matter. I've updated the patch for ada(4). It includes a bug fix (command1 vs. command2) and uses the howto flags passed to the shutdown function. Thanks again for pointing these out. Best regards Oliver --- ata_da.c.orig 2010-05-23 18:16:33.0 +0200 +++ ata_da.c2010-09-16 17:21:10.0 +0200 @@ -42,6 +42,7 @@ #include sys/eventhandler.h #include sys/malloc.h #include sys/cons.h +#include sys/reboot.h #include geom/geom_disk.h #endif /* _KERNEL */ @@ -79,7 +80,8 @@ ADA_FLAG_CAN_TRIM = 0x080, ADA_FLAG_OPEN = 0x100, ADA_FLAG_SCTX_INIT = 0x200, - ADA_FLAG_CAN_CFA= 0x400 + ADA_FLAG_CAN_CFA= 0x400, + ADA_FLAG_CAN_POWERMGT = 0x800 } ada_flags; typedef enum { @@ -180,6 +182,10 @@ #defineADA_DEFAULT_SEND_ORDERED1 #endif +#ifndefADA_DEFAULT_SPINDOWN_SHUTDOWN +#defineADA_DEFAULT_SPINDOWN_SHUTDOWN 1 +#endif + /* * Most platforms map firmware geometry to actual, but some don't. If * not overridden, default to nothing. @@ -191,6 +197,7 @@ static int ada_retry_count = ADA_DEFAULT_RETRY; static int ada_default_timeout = ADA_DEFAULT_TIMEOUT; static int ada_send_ordered = ADA_DEFAULT_SEND_ORDERED; +static int ada_spindown_shutdown = ADA_DEFAULT_SPINDOWN_SHUTDOWN; SYSCTL_NODE(_kern_cam, OID_AUTO, ada, CTLFLAG_RD, 0, CAM Direct Access Disk driver); @@ -203,6 +210,9 @@ SYSCTL_INT(_kern_cam_ada, OID_AUTO, ada_send_ordered, CTLFLAG_RW, ada_send_ordered, 0, Send Ordered Tags); TUNABLE_INT(kern.cam.ada.ada_send_ordered, ada_send_ordered); +SYSCTL_INT(_kern_cam_ada, OID_AUTO, spindown_shutdown, CTLFLAG_RW, + ada_spindown_shutdown, 0, Spin down upon shutdown); +TUNABLE_INT(kern.cam.ada.spindown_shutdown, ada_spindown_shutdown); /* * ADA_ORDEREDTAG_INTERVAL determines how often, relative @@ -665,6 +675,8 @@ softc-flags |= ADA_FLAG_CAN_48BIT; if (cgd-ident_data.support.command2 ATA_SUPPORT_FLUSHCACHE) softc-flags |= ADA_FLAG_CAN_FLUSHCACHE; + if (cgd-ident_data.support.command1 ATA_SUPPORT_POWERMGT) + softc-flags |= ADA_FLAG_CAN_POWERMGT; if (cgd-ident_data.satacapabilities ATA_SUPPORT_NCQ cgd-inq_flags SID_CmdQue) softc-flags |= ADA_FLAG_CAN_NCQ; @@ -1222,6 +1234,58 @@ /*getcount_only*/0); cam_periph_unlock(periph); } + + if (ada_spindown_shutdown == 0 || + (howto (RB_HALT | RB_POWEROFF)) == 0) + return; + + DELAY(50); + + TAILQ_FOREACH(periph, adadriver.units, unit_links) { + union ccb ccb; + + /* If we paniced with lock held - not recurse here. */ + if (cam_periph_owned(periph)) + continue; + cam_periph_lock(periph); + softc = (struct ada_softc *)periph-softc; + /* +* We only spin-down the drive if it is capable of it.. +*/ + if ((softc-flags ADA_FLAG_CAN_POWERMGT) == 0) { + cam_periph_unlock(periph); + continue; + } + + /* XXX Hide this behind bootverbose? */ + xpt_print(periph-path, spin-down\n); + + xpt_setup_ccb(ccb.ccb_h, periph-path, CAM_PRIORITY_NORMAL); + + ccb.ccb_h.ccb_state = ADA_CCB_DUMP; + cam_fill_ataio(ccb.ataio, + 1, + adadone, +
Re: odd issues with DDB vs GDB
John Baldwin wrote: On Wednesday, September 15, 2010 8:01:19 pm Patrick Mahan wrote: All, I am trying to debug a system hang occurring on my HP Proliant G6 running some of our kernel software. I am seeing that under certain test loads, the system will hang-up complete, no keyboard, no console, etc. I suspect it is some of the kernel code that I have inherited that contains a lot of locking (lots of data structure, each having their own mutex lock (sleepable)). You need to use 'kgdb' rather than 'gdb' on kernel.debug. Doh! *-( I'm so used to gdb even though I use kgdb for looking at crash dumps. Thanks, Patrick ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Questions about mutex implementation in kern/kern_mutex.c
On Wed, Sep 15, 2010 at 08:46:00AM -0700, Matthew Fleming wrote: I'll take a stab at answering these... On Wed, Sep 15, 2010 at 6:44 AM, Andrey Simonenko si...@comsys.ntu-kpi.kiev.ua wrote: Hello, I have questions about mutex implementation in kern/kern_mutex.c and sys/mutex.h files (current versions of these files): 1. Is the following statement correct for a volatile pointer or integer variable: if a volatile variable is updated by the compare-and-set instruction (e.g. atomic_cmpset_ptr(val, ...)), then the current value of such variable can be read without any special instruction (e.g. v = val)? I checked Assembler code for a function with v = val and val = v like statements generated for volatile variable and simple variable and found differences: on ia64 v = val was implemented by ld.acq and val = v was implemented by st.rel; on mips and sparc64 Assembler code can have different order of lines for volatile and simple variable (depends on the code of a function). I think this depends somewhat on the hardware and what you mean by current value. Current value means that the value of a variable read by one thread is equal to the value of this variable successfully updated by another thread by the compare-and-set instruction. As I understand from the kernel source code, atomic_cmpset_ptr() allows to update a variable in a way that all other CPUs will invalidate corresponding cache lines that contain the value of this variable. The mtx_owned(9) macro uses this property, mtx_owned() does not use anything special to compare the value of m-mtx_lock (volatile) with current thread pointer, all other functions that update m-mtx_lock of unowned mutex use compare-and-set instruction. Also I cannot find anything special in generated Assembler code for volatile variables (except for ia64 where acquire loads and release stores are used). If you want a value that is not in-flux, then something like atomic_cmpset_ptr() setting to the current value is needed, so that you force any other atomic_cmpset to fail. However, since there is no explicit lock involved, there is no strong meaning for current value and a read that does not rely on a value cached in a register is likely sufficient. While the volatile keyword in C has no explicit hardware meaning, it often means that a load from memory (or, presumably, L1-L3 cache) is required. The volatile keyword here and all questions are related to the base C compiler, current version and currently supported architectures in FreeBSD. Yes, here under volatile I want to say that the value of a variable is not cached in a register and it is referenced by its address in all commands. There are some places in the kernel where a variable is updated in something like do { v = value; } while (!atomic_cmpset_int(value, ...)); and that variable is not volatile, but the compiler generates correct Assembler code. So volatile is not a requirement for all cases. 2. Let there is a default (sleep) mutex and adaptive mutexes is enabled. A thread tries to obtain lock quickly and fails, _mtx_lock_sleep() is called, it gets the address of the current mutex's owner thread and checks whether that owner thread is running (on another CPU). How does _mtx_lock_sleep() know that that thread still exists (lines 311-337 in kern_mutex.c)? When adaptive mutexes was implemented there was explicit locking around adaptive mutexes code. When turnstile in mutex code was implemented that's locking logic was changed. It appears that it's possible for the thread pointer to be recycled between fetching the value of owner and looking at TD_IS_RUNNING. On actual hardware, this race is unlikely to occur due to the time it takes for a thread to release a lock and perform all of thread exit code before the struct thread is returned to the uma zone. However, even once returned to the uma zone on many FreeBSD implementations the access is safe as the address of the thread is still dereferenceable, due to the implementation of uma zones. I checked exactly this scenario, that's why asked this question to verify my understanding. 3. Why there is no any memory barrier in mtx_init()? If another thread (on another CPU) finds that mutex is initialized using mtx_initialized() then it can mtx_lock() it and mtx_lock() it second time, as a result mtx_recurse field will be increased, but its value still can be uninitialized on architecture with relaxed memory ordering model. It seems to me that it's generally a programming error to rely on the return of mtx_initialized(), as there is no serialization with e.g. a thread calling mtx_destroy(). A fully correct serialization model would require that a single thread initialize the mtx and then create any worker threads that will use the mtx. I agree that this should not happen in practice. Another thread can get a pointer to just initialized mutex and
Re: Summary: Re: Spin down HDD after disk sync or before power off
On Thu, 16 Sep 2010 09:17:52 +0200 Dag-Erling Smørgrav d...@des.no wrote: Garrett Cooper gcoo...@freebsd.org writes: Agreed. Spinning down at reboot isn't smart and seems like a good way to kill a disk quicker. *not* spinning down at halt is far worse. Most modern disks are rated for hundreds of thousands of load-unload cycles, but far fewer emergency unloads (which is what happens when the drive loses power while still spinning). As I understand it wear from spinning-down used to come from the head actually scraping the disk surface as it lost lift, parking placed the head on a disposable area, but modern drives take the head off the disk altogether. When Hitachi was specifying 300,000 unloads, they said that in testing the drives were still working at 1,000,000, someone quoted 600,000 as the current spec. At these levels you can be spinning the drives down and up ever few minutes for the normal lifetime of the drive. Even on very old drives I doubt reboot are much of a problem, they're rare on servers. On laptops and desktops they're rare compared to shutdowns and suspends. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Questions about mutex implementation in kern/kern_mutex.c
On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: On Wed, Sep 15, 2010 at 08:46:00AM -0700, Matthew Fleming wrote: I'll take a stab at answering these... On Wed, Sep 15, 2010 at 6:44 AM, Andrey Simonenko si...@comsys.ntu-kpi.kiev.ua wrote: Hello, I have questions about mutex implementation in kern/kern_mutex.c and sys/mutex.h files (current versions of these files): 1. Is the following statement correct for a volatile pointer or integer variable: if a volatile variable is updated by the compare-and-set instruction (e.g. atomic_cmpset_ptr(val, ...)), then the current value of such variable can be read without any special instruction (e.g. v = val)? I checked Assembler code for a function with v = val and val = v like statements generated for volatile variable and simple variable and found differences: on ia64 v = val was implemented by ld.acq and val = v was implemented by st.rel; on mips and sparc64 Assembler code can have different order of lines for volatile and simple variable (depends on the code of a function). I think this depends somewhat on the hardware and what you mean by current value. Current value means that the value of a variable read by one thread is equal to the value of this variable successfully updated by another thread by the compare-and-set instruction. As I understand from the kernel source code, atomic_cmpset_ptr() allows to update a variable in a way that all other CPUs will invalidate corresponding cache lines that contain the value of this variable. That is not true. It is likely true on x86, but it is certainly not true on other architectures such as sparc64 where a write may be held in a store buffer for an indeterminate amount of time (and note that some lock releases are simple stores with a rel memory barrier). All that we require is that if the value is stale, the atomic_cmpset() that attempts to set MTX_CONTESTED will fail. The mtx_owned(9) macro uses this property, mtx_owned() does not use anything special to compare the value of m-mtx_lock (volatile) with current thread pointer, all other functions that update m-mtx_lock of unowned mutex use compare-and-set instruction. Also I cannot find anything special in generated Assembler code for volatile variables (except for ia64 where acquire loads and release stores are used). No, mtx_owned() is just not harmed by the races it loses. You can certainly read a stale value of mtx_lock in mtx_owned() if some other thread owns the lock or has just released the lock. However, we don't care, because in both of those cases, mtx_owned() returns false. What does matter is that mtx_owned() can only return true if we currently hold the mutex. This works because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the same time, and 2) even CPUs that hold writes in store buffers will snoop their store buffer for local reads on that CPU. That is, a given CPU will never read a stale value of a memory word that is older than a write it has performed to that word. If you want a value that is not in-flux, then something like atomic_cmpset_ptr() setting to the current value is needed, so that you force any other atomic_cmpset to fail. However, since there is no explicit lock involved, there is no strong meaning for current value and a read that does not rely on a value cached in a register is likely sufficient. While the volatile keyword in C has no explicit hardware meaning, it often means that a load from memory (or, presumably, L1-L3 cache) is required. The volatile keyword here and all questions are related to the base C compiler, current version and currently supported architectures in FreeBSD. Yes, here under volatile I want to say that the value of a variable is not cached in a register and it is referenced by its address in all commands. There are some places in the kernel where a variable is updated in something like do { v = value; } while (!atomic_cmpset_int(value, ...)); and that variable is not volatile, but the compiler generates correct Assembler code. So volatile is not a requirement for all cases. Hmm, I suspect that many of those places actually do use volatile. The various lock cookies (mtx_lock, etc.) are declared volatile in the structure. Otherwise the compiler would be free to conclude that 'v = value;' is a loop invariant and move it out of the loop which would break. Given that, the construct you referred to does in fact require 'value' to be volatile. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
race conditions for destroying and opening a dev
Has anyone seen this scenario before? I am seeing it in RELENG_7, but the code in question exists through to head. Thread 1: (kgdb) where #0 sched_switch (td=0xff003a04ea80, newtd=0xff00210b4000, flags=Variable flags is not available. ) at ../../../kern/sched_ule.c:1944 #1 0x803b6091 in mi_switch (flags=1, newtd=0x0) at ../../../kern/kern_synch.c:450 #2 0x80402399 in sleepq_switch (wchan=0xff8413b50b60) at ../../../kern/subr_sleepqueue.c:497 #3 0x80402e8c in sleepq_timedwait (wchan=0xff8413b50b60) at ../../../kern/subr_sleepqueue.c:615 #4 0x803b682d in _sleep (ident=0xff8413b50b60, lock=0x80b0ee00, priority=76, wmesg=0x806583bb devdrn, timo=100) at ../../../kern/kern_synch.c:228 #5 0x8037640c in destroy_devl (dev=0xff003aaf) at ../../../kern/kern_conf.c:874 #6 0x80376759 in destroy_dev (dev=0xff003aaf) at ../../../kern/kern_conf.c:916 #7 0x8034c939 in g_dev_orphan (cp=0xff003a544800) at ../../../geom/geom_dev.c:438 #8 0x803506a0 in g_run_events () at ../../../geom/geom_event.c:164 #9 0x80351f1c in g_event_procbody () at ../../../geom/geom_kern.c:141 #10 0x8038a73a in fork_exit (callout=0x80351eb0 g_event_procbody at ../../../geom/geom_kern.c:132, arg=0x0, frame=0xff8413b50c80) at ../../../kern/kern_fork.c:829 #11 0x805a747e in fork_trampoline () at ../../../amd64/amd64/exception.S:564 #12 0x in ?? () This thread is waiting on the threadcount to go away- i.e., the last close of the device to occur (da16 in this case). Thread 2: (kgdb) where #0 sched_switch (td=0xff009bb4ca80, newtd=0xff003af43380, flags=Variable flags is not available. ) at ../../../kern/sched_ule.c:1944 #1 0x803b6091 in mi_switch (flags=1, newtd=0x0) at ../../../kern/kern_synch.c:450 #2 0x80402399 in sleepq_switch (wchan=0x80b0e040) at ../../../kern/subr_sleepqueue.c:497 #3 0x80402f84 in sleepq_wait (wchan=0x80b0e040) at ../../../kern/subr_sleepqueue.c:580 #4 0x803b5385 in _sx_xlock_hard (sx=0x80b0e040, tid=18446742976810240640, opts=Variable opts is not available. ) at ../../../kern/kern_sx.c:562 #5 0x803b5731 in _sx_xlock (sx=0x80b0e040, opts=0, file=0x80652d27 ../../../geom/geom_dev.c, line=196) at sx.h:154 #6 0x8034d1bc in g_dev_open (dev=0xff003aaf, flags=1, fmt=Variable fmt is not available. ) at ../../../geom/geom_dev.c:196 #7 0x80333741 in devfs_open (ap=0xff841dea88b0) at ../../../fs/devfs/devfs_vnops.c:902 #8 0x80601daf in VOP_OPEN_APV (vop=0x8089fb80, a=0xff841dea88b0) at vnode_if.c:371 #9 0x80467246 in vn_open_cred (ndp=0xff841dea8a00, flagp=0xff841dea894c, cmode=Variable cmode is not available. ) at vnode_if.h:199 #10 0x80463770 in kern_open (td=0xff009bb4ca80, path=0x5114a0 Address 0x5114a0 out of bounds, pathseg=Variable pathseg is not available. ) at ../../../kern/vfs_syscalls.c:1054 #11 0x805c599e in syscall (frame=0xff841dea8c80) at ../../../amd64/amd64/trap.c:911 #12 0x805a723b in Xfast_syscall () at ../../../amd64/amd64/exception.S:349 #13 0x0008009a219c in ?? () This thread was opening the device, bumped the refcount, but then wedged on the geom topology lock . the refcount field is protected under devmtx Anyone seen this? I'm half inclined to either add in CDP_SCHED_DTR when one calls destroy_dev, or make dev_refthread look at CDP_ACTIVE, leaning more toward the latter. Any thoughts on this? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: race conditions for destroying and opening a dev
On Thu, Sep 16, 2010 at 12:00:29PM -0700, Matthew Jacob wrote: Has anyone seen this scenario before? I am seeing it in RELENG_7, but the code in question exists through to head. Thread 1: (kgdb) where #0 sched_switch (td=0xff003a04ea80, newtd=0xff00210b4000, flags=Variable flags is not available. ) at ../../../kern/sched_ule.c:1944 #1 0x803b6091 in mi_switch (flags=1, newtd=0x0) at ../../../kern/kern_synch.c:450 #2 0x80402399 in sleepq_switch (wchan=0xff8413b50b60) at ../../../kern/subr_sleepqueue.c:497 #3 0x80402e8c in sleepq_timedwait (wchan=0xff8413b50b60) at ../../../kern/subr_sleepqueue.c:615 #4 0x803b682d in _sleep (ident=0xff8413b50b60, lock=0x80b0ee00, priority=76, wmesg=0x806583bb devdrn, timo=100) at ../../../kern/kern_synch.c:228 #5 0x8037640c in destroy_devl (dev=0xff003aaf) at ../../../kern/kern_conf.c:874 #6 0x80376759 in destroy_dev (dev=0xff003aaf) at ../../../kern/kern_conf.c:916 #7 0x8034c939 in g_dev_orphan (cp=0xff003a544800) at ../../../geom/geom_dev.c:438 #8 0x803506a0 in g_run_events () at ../../../geom/geom_event.c:164 #9 0x80351f1c in g_event_procbody () at ../../../geom/geom_kern.c:141 #10 0x8038a73a in fork_exit (callout=0x80351eb0 g_event_procbody at ../../../geom/geom_kern.c:132, arg=0x0, frame=0xff8413b50c80) at ../../../kern/kern_fork.c:829 #11 0x805a747e in fork_trampoline () at ../../../amd64/amd64/exception.S:564 #12 0x in ?? () This thread is waiting on the threadcount to go away- i.e., the last close of the device to occur (da16 in this case). Thread 2: (kgdb) where #0 sched_switch (td=0xff009bb4ca80, newtd=0xff003af43380, flags=Variable flags is not available. ) at ../../../kern/sched_ule.c:1944 #1 0x803b6091 in mi_switch (flags=1, newtd=0x0) at ../../../kern/kern_synch.c:450 #2 0x80402399 in sleepq_switch (wchan=0x80b0e040) at ../../../kern/subr_sleepqueue.c:497 #3 0x80402f84 in sleepq_wait (wchan=0x80b0e040) at ../../../kern/subr_sleepqueue.c:580 #4 0x803b5385 in _sx_xlock_hard (sx=0x80b0e040, tid=18446742976810240640, opts=Variable opts is not available. ) at ../../../kern/kern_sx.c:562 #5 0x803b5731 in _sx_xlock (sx=0x80b0e040, opts=0, file=0x80652d27 ../../../geom/geom_dev.c, line=196) at sx.h:154 #6 0x8034d1bc in g_dev_open (dev=0xff003aaf, flags=1, fmt=Variable fmt is not available. ) at ../../../geom/geom_dev.c:196 #7 0x80333741 in devfs_open (ap=0xff841dea88b0) at ../../../fs/devfs/devfs_vnops.c:902 #8 0x80601daf in VOP_OPEN_APV (vop=0x8089fb80, a=0xff841dea88b0) at vnode_if.c:371 #9 0x80467246 in vn_open_cred (ndp=0xff841dea8a00, flagp=0xff841dea894c, cmode=Variable cmode is not available. ) at vnode_if.h:199 #10 0x80463770 in kern_open (td=0xff009bb4ca80, path=0x5114a0 Address 0x5114a0 out of bounds, pathseg=Variable pathseg is not available. ) at ../../../kern/vfs_syscalls.c:1054 #11 0x805c599e in syscall (frame=0xff841dea8c80) at ../../../amd64/amd64/trap.c:911 #12 0x805a723b in Xfast_syscall () at ../../../amd64/amd64/exception.S:349 #13 0x0008009a219c in ?? () This thread was opening the device, bumped the refcount, but then wedged on the geom topology lock . the refcount field is protected under devmtx Anyone seen this? I'm half inclined to either add in CDP_SCHED_DTR when one calls destroy_dev, or make dev_refthread look at CDP_ACTIVE, leaning more toward the latter. Any thoughts on this? And who owns the topology lock ? Is it thread 1 ? Destroy_devl() clears si_devsw for departing cdev, and *refthread() checks si_devsw against NULL as an indicator of device destruction in progress. I think that this situation is what destroy_dev_sched(9) was created for. pgpQbeqwkz8X0.pgp Description: PGP signature
Re: race conditions for destroying and opening a dev
On Thu, Sep 16, 2010 at 12:00 PM, Matthew Jacob m...@feral.com wrote: Has anyone seen this scenario before? I am seeing it in RELENG_7, but the code in question exists through to head. Thread 1: (kgdb) where #0 sched_switch (td=0xff003a04ea80, newtd=0xff00210b4000, flags=Variable flags is not available. ) at ../../../kern/sched_ule.c:1944 #1 0x803b6091 in mi_switch (flags=1, newtd=0x0) at ../../../kern/kern_synch.c:450 #2 0x80402399 in sleepq_switch (wchan=0xff8413b50b60) at ../../../kern/subr_sleepqueue.c:497 #3 0x80402e8c in sleepq_timedwait (wchan=0xff8413b50b60) at ../../../kern/subr_sleepqueue.c:615 #4 0x803b682d in _sleep (ident=0xff8413b50b60, lock=0x80b0ee00, priority=76, wmesg=0x806583bb devdrn, timo=100) at ../../../kern/kern_synch.c:228 #5 0x8037640c in destroy_devl (dev=0xff003aaf) at ../../../kern/kern_conf.c:874 #6 0x80376759 in destroy_dev (dev=0xff003aaf) at ../../../kern/kern_conf.c:916 #7 0x8034c939 in g_dev_orphan (cp=0xff003a544800) at ../../../geom/geom_dev.c:438 #8 0x803506a0 in g_run_events () at ../../../geom/geom_event.c:164 #9 0x80351f1c in g_event_procbody () at ../../../geom/geom_kern.c:141 #10 0x8038a73a in fork_exit (callout=0x80351eb0 g_event_procbody at ../../../geom/geom_kern.c:132, arg=0x0, frame=0xff8413b50c80) at ../../../kern/kern_fork.c:829 #11 0x805a747e in fork_trampoline () at ../../../amd64/amd64/exception.S:564 #12 0x in ?? () This thread is waiting on the threadcount to go away- i.e., the last close of the device to occur (da16 in this case). Thread 2: (kgdb) where #0 sched_switch (td=0xff009bb4ca80, newtd=0xff003af43380, flags=Variable flags is not available. ) at ../../../kern/sched_ule.c:1944 #1 0x803b6091 in mi_switch (flags=1, newtd=0x0) at ../../../kern/kern_synch.c:450 #2 0x80402399 in sleepq_switch (wchan=0x80b0e040) at ../../../kern/subr_sleepqueue.c:497 #3 0x80402f84 in sleepq_wait (wchan=0x80b0e040) at ../../../kern/subr_sleepqueue.c:580 #4 0x803b5385 in _sx_xlock_hard (sx=0x80b0e040, tid=18446742976810240640, opts=Variable opts is not available. ) at ../../../kern/kern_sx.c:562 #5 0x803b5731 in _sx_xlock (sx=0x80b0e040, opts=0, file=0x80652d27 ../../../geom/geom_dev.c, line=196) at sx.h:154 #6 0x8034d1bc in g_dev_open (dev=0xff003aaf, flags=1, fmt=Variable fmt is not available. ) at ../../../geom/geom_dev.c:196 #7 0x80333741 in devfs_open (ap=0xff841dea88b0) at ../../../fs/devfs/devfs_vnops.c:902 #8 0x80601daf in VOP_OPEN_APV (vop=0x8089fb80, a=0xff841dea88b0) at vnode_if.c:371 #9 0x80467246 in vn_open_cred (ndp=0xff841dea8a00, flagp=0xff841dea894c, cmode=Variable cmode is not available. ) at vnode_if.h:199 #10 0x80463770 in kern_open (td=0xff009bb4ca80, path=0x5114a0 Address 0x5114a0 out of bounds, pathseg=Variable pathseg is not available. ) at ../../../kern/vfs_syscalls.c:1054 #11 0x805c599e in syscall (frame=0xff841dea8c80) at ../../../amd64/amd64/trap.c:911 #12 0x805a723b in Xfast_syscall () at ../../../amd64/amd64/exception.S:349 #13 0x0008009a219c in ?? () This thread was opening the device, bumped the refcount, but then wedged on the geom topology lock . the refcount field is protected under devmtx Anyone seen this? I'm half inclined to either add in CDP_SCHED_DTR when one calls destroy_dev, or make dev_refthread look at CDP_ACTIVE, leaning more toward the latter. Any thoughts on this? We had a similar bug at Isilon, but in our case it was in cam/scsi/scsi_pass.c::passcleanup() calling destroy_dev(). We switched it to destroy_dev_sched() to fix the si_threadcount deadlock. Cheers, matthew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: race conditions for destroying and opening a dev
kostik, matthew- thanks mucho! ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Summary: Re: Spin down HDD after disk sync or before power off
On Thu, 16 Sep 2010, Oliver Fromme wrote: I've updated the patch for ada(4). It includes a bug fix (command1 vs. command2) and uses the howto flags passed to the shutdown function. Thanks again for pointing these out. Works perfectly on a system here. Thanks! ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Questions about mutex implementation in kern/kern_mutex.c
On Thu, 16 Sep 2010, John Baldwin wrote: On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: The mtx_owned(9) macro uses this property, mtx_owned() does not use anything special to compare the value of m-mtx_lock (volatile) with current thread pointer, all other functions that update m-mtx_lock of unowned mutex use compare-and-set instruction. Also I cannot find anything special in generated Assembler code for volatile variables (except for ia64 where acquire loads and release stores are used). No, mtx_owned() is just not harmed by the races it loses. You can certainly read a stale value of mtx_lock in mtx_owned() if some other thread owns the lock or has just released the lock. However, we don't care, because in both of those cases, mtx_owned() returns false. What does matter is that mtx_owned() can only return true if we currently hold the mutex. This works because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the same time, and 2) even CPUs that hold writes in store buffers will snoop their store buffer for local reads on that CPU. That is, a given CPU will never read a stale value of a memory word that is older than a write it has performed to that word. Sorry for the naive question, but would you mind expounding a bit on what keeps the thread from migrating to a different CPU and getting a stale value there? (I can imagine a couple possible mechanisms, but don't know enough to know which one(s) are the real ones.) Thanks, Ben Kaduk ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org