memory allocation in spinlock context
I am trying to understand if it is possible to allow memory allocations (M_NOWAIT, of course) in a spinlock context. I do not see any obvious architectural obstacles. But the fact that all of the uma locks, system map lock, object locks, page queue locks and so on are regular mutexes makes it impossible to allocate memory without violating the fundamental lock ordering rules. Could all of the above mentioned locks potentially be converted to spin mutexes? (And that seems to be a large nasty change) Are there any alternative possibilities? BTW, currently we have at least one place where a memory allocation of this kind is done stealthily (and thus dangerously?). ACPI resume code must execute AcpiLeaveSleepStatePrep with interrupts disabled and ACPICA code performs memory allocations in that code path. Since the interrupts are disabled by means of intr_disable(), witness(9) and similar are completely oblivious of the fact. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: memory allocation in spinlock context
On 3/1/2013 5:50 AM, Andriy Gapon wrote: I am trying to understand if it is possible to allow memory allocations (M_NOWAIT, of course) in a spinlock context. There are mechanisms to do just this- essentially by creating private pools that are organized in a way to allow for spinlock (and thus possible interrupt level) safe allocations (and failure if the pool is empty). Are you trying to make a general mechanism for this? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: memory allocation in spinlock context
On Fri, Mar 1, 2013 at 2:50 PM, Andriy Gapon a...@freebsd.org wrote: I am trying to understand if it is possible to allow memory allocations (M_NOWAIT, of course) in a spinlock context. I do not see any obvious architectural obstacles. But the fact that all of the uma locks, system map lock, object locks, page queue locks and so on are regular mutexes makes it impossible to allocate memory without violating the fundamental lock ordering rules. Could all of the above mentioned locks potentially be converted to spin mutexes? (And that seems to be a large nasty change) AFAIK they're suitable for particular uses and not in general. For example if the critical section is short, spinning rather than sleeping could avoid a potential context switches, increasing performances. OTOH has the disadvantage of wasting time that could be used in other activities. So, IMHO, if such a change need to be adopted, ti should be pondered/profiled more than a bit, and I doubt it could be used for the wide class of locks you mentioned. Are there any alternative possibilities? Is there anything that prevent you to drop the lock, perform the allocation in a reliable fashion (M_WAITOK) and try to reacquire the lock later on? BTW, currently we have at least one place where a memory allocation of this kind is done stealthily (and thus dangerously?). ACPI resume code must execute AcpiLeaveSleepStatePrep with interrupts disabled and ACPICA code performs memory allocations in that code path. Since the interrupts are disabled by means of intr_disable(), witness(9) and similar are completely oblivious of the fact. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org Thanks, -- Davide ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: memory allocation in spinlock context
on 01/03/2013 16:22 Matthew Jacob said the following: On 3/1/2013 5:50 AM, Andriy Gapon wrote: I am trying to understand if it is possible to allow memory allocations (M_NOWAIT, of course) in a spinlock context. There are mechanisms to do just this- essentially by creating private pools that are organized in a way to allow for spinlock (and thus possible interrupt level) safe allocations (and failure if the pool is empty). Are you trying to make a general mechanism for this? I am just pondering what would it take to develop such a general mechanism. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: looking for someone to fix humanize_number (test cases included)
Clifton Royston wrote this message on Tue, Dec 25, 2012 at 09:46 -1000: On Tue, Dec 25, 2012 at 08:23:55AM -1000, Clifton Royston wrote: On Tue, Dec 25, 2012 at 07:20:37AM -1000, Clifton Royston wrote: On Mon, Dec 24, 2012 at 12:00:01PM +, freebsd-hackers-requ...@freebsd.org wrote: From: John-Mark Gurney j...@funkthat.com To: hack...@freebsd.org Subject: looking for someone to fix humanize_number (test cases included) I'm looking for a person who is interested in fixing up humanize_number. ... So I decided to write a test program to test the output, and now I'm even more surprised by the output... Neither 7.2-R nor 10-current give what I expect are the correct results... ... I am bemused. I correct myself: the function works fine, and there are no bugs I could find, though it's clear the man page could emphasize the correct usage a bit more. I had to read the source several times and start on debugging it before I understood the correct usage of the flag values with the scale and flags parameters, despite the man page stating: The following flags may be passed in scale: HN_AUTOSCALE Format the buffer using the lowest multiplier pos- sible. HN_GETSCALE Return the prefix index number (the number of times number must be divided to fit) instead of formatting it to the buffer. The following flags may be passed in flags: HN_DECIMAL If the final result is less than 10, display it using one digit. ... HN_DIVISOR_1000 Divide number with 1000 instead of 1024. That is, certain flags must be passed in flags and others must only be passed in scale - a bit counter-intuitive. Also, scale == 0 is clearly not interpreted as AUTOSCALE, but I am not yet clear how it is being handled - it seems somewhat like AUTOSCALE but not identical. When the test program constant table is updated to pass the scale flags as specified, as well as fixing the bugs mentioned in the previous emails, it all passes except for the one (intentional?) inconsistency that k is used in place of K if HN_DECIMAL is enabled. The bug in the transfer speed results which prompted this inquiry suggests that perhaps some clients of humanize_number in the codebase are also passing the scale parameters incorrectly. I would propose accepting HN_AUTOSCALE and HN_GETSCALE in the flags field (they don't overlap with other values) while continuing to accept them in the scale field for backwards compatibility. Trivial diff below. Sorry I didn't get back to this, but now I have a few minutes... + getscale = (flags | scale) HN_GETSCALE; This isn't good: #define HN_IEC_PREFIXES 0x10 #define HN_GETSCALE 0x10 If someone sets HN_IEC_PREFIXES, they'll acidentally enable _GETSCALE.. We could do something anoying by changing the value of _GETSCALE, and then leaving some legacy code to accept the old _GETSCALE on the scale input... This would let new code work, but would break new code on old libraries... So, I don't see an easy way to fix this... -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: memory allocation in spinlock context
On 3/1/13 5:50 AM, Andriy Gapon wrote: I am trying to understand if it is possible to allow memory allocations (M_NOWAIT, of course) in a spinlock context. I do not see any obvious architectural obstacles. But the fact that all of the uma locks, system map lock, object locks, page queue locks and so on are regular mutexes makes it impossible to allocate memory without violating the fundamental lock ordering rules. Could all of the above mentioned locks potentially be converted to spin mutexes? (And that seems to be a large nasty change) Are there any alternative possibilities? BTW, currently we have at least one place where a memory allocation of this kind is done stealthily (and thus dangerously?). ACPI resume code must execute AcpiLeaveSleepStatePrep with interrupts disabled and ACPICA code performs memory allocations in that code path. Since the interrupts are disabled by means of intr_disable(), witness(9) and similar are completely oblivious of the fact. Typically the need for such a facility means that the locks are being held for too long. I think someone has suggested using a private allocator carving out of a pre-allocated space. Depending on the subsystem you are allocating for this may work for you. I am looking to do this for the kernel gzip routines so that we can do compressed kernel dumps as soon as I verify the bounds of the gzip allocations. -Alfred ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [patch] statfs does not detect -t nullfs -o union as a union mount
On Thu, Feb 28, 2013 at 12:21:44AM +0200, Konstantin Belousov wrote: On Wed, Feb 27, 2013 at 10:31:42PM +0100, Jilles Tjoelker wrote: While testing recent changes to opendir(), I noticed that fstatfs() does not return the MNT_UNION flag for a -t nullfs -o union mount. As a result, opendir()/readdir() return files that exist in both top and bottom directories twice (at least . and ..). Other -o union mounts and -t unionfs mounts work correctly in this regard. The below patch passes through just the MNT_UNION flag of the nullfs mount itself. Perhaps more flags should be passed through. commit fce32a779af4eb977c9b96feb6e4f811d89f2881 Author: Jilles Tjoelker jil...@stack.nl Date: Sat Feb 23 22:22:39 2013 +0100 nullfs: Preserve the MNT_UNION flag of the nullfs mount itself. This is needed so that opendir() can properly detect a union mount like mount -t nullfs -o union dir1 dir2. diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index 3724e0a..ff06f57 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -313,7 +313,7 @@ nullfs_statfs(mp, sbp) /* now copy across the interesting information and fake the rest */ sbp-f_type = mstat.f_type; - sbp-f_flags = mstat.f_flags; + sbp-f_flags = (sbp-f_flags MNT_UNION) | mstat.f_flags; sbp-f_bsize = mstat.f_bsize; sbp-f_iosize = mstat.f_iosize; sbp-f_blocks = mstat.f_blocks; Would it make sense to preserve more flags from the upper mount ? I see a use for MNT_NOEXEC as well, at least. Yes, preserving MNT_NOEXEC will make -t nullfs -o noexec work better, in particular rtld's check for libraries loaded via environment variables. In the same way MNT_RDONLY, MNT_NOSUID and MNT_NOSYMFOLLOW could be preserved. On the other hand, MNT_ROOTFS should probably not be passed through from the underlying filesystem. This would give sbp-f_flags = (sbp-f_flags (MNT_RDONLY | MNT_NOEXEC | MNT_NOSUID | MNT_UNION | MNT_NOSYMFOLLOW) | (mstat.f_flags ~MNT_ROOTFS); -- Jilles Tjoelker ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [patch] statfs does not detect -t nullfs -o union as a union mount
On Sat, Mar 02, 2013 at 01:00:33AM +0100, Jilles Tjoelker wrote: On Thu, Feb 28, 2013 at 12:21:44AM +0200, Konstantin Belousov wrote: On Wed, Feb 27, 2013 at 10:31:42PM +0100, Jilles Tjoelker wrote: While testing recent changes to opendir(), I noticed that fstatfs() does not return the MNT_UNION flag for a -t nullfs -o union mount. As a result, opendir()/readdir() return files that exist in both top and bottom directories twice (at least . and ..). Other -o union mounts and -t unionfs mounts work correctly in this regard. The below patch passes through just the MNT_UNION flag of the nullfs mount itself. Perhaps more flags should be passed through. commit fce32a779af4eb977c9b96feb6e4f811d89f2881 Author: Jilles Tjoelker jil...@stack.nl Date: Sat Feb 23 22:22:39 2013 +0100 nullfs: Preserve the MNT_UNION flag of the nullfs mount itself. This is needed so that opendir() can properly detect a union mount like mount -t nullfs -o union dir1 dir2. diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index 3724e0a..ff06f57 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -313,7 +313,7 @@ nullfs_statfs(mp, sbp) /* now copy across the interesting information and fake the rest */ sbp-f_type = mstat.f_type; - sbp-f_flags = mstat.f_flags; + sbp-f_flags = (sbp-f_flags MNT_UNION) | mstat.f_flags; sbp-f_bsize = mstat.f_bsize; sbp-f_iosize = mstat.f_iosize; sbp-f_blocks = mstat.f_blocks; Would it make sense to preserve more flags from the upper mount ? I see a use for MNT_NOEXEC as well, at least. Yes, preserving MNT_NOEXEC will make -t nullfs -o noexec work better, in particular rtld's check for libraries loaded via environment variables. In the same way MNT_RDONLY, MNT_NOSUID and MNT_NOSYMFOLLOW could be preserved. On the other hand, MNT_ROOTFS should probably not be passed through from the underlying filesystem. This would give sbp-f_flags = (sbp-f_flags (MNT_RDONLY | MNT_NOEXEC | MNT_NOSUID | MNT_UNION | MNT_NOSYMFOLLOW) | (mstat.f_flags ~MNT_ROOTFS); I think this is fine. pgpbxLGJDTdz9.pgp Description: PGP signature