memory allocation in spinlock context

2013-03-01 Thread Andriy Gapon

I am trying to understand if it is possible to allow memory allocations 
(M_NOWAIT,
of course) in a spinlock context.
I do not see any obvious architectural obstacles.
But the fact that all of the uma locks, system map lock, object locks, page 
queue
locks and so on are regular mutexes makes it impossible to allocate memory 
without
violating the fundamental lock ordering rules.

Could all of the above mentioned locks potentially be converted to spin mutexes?
(And that seems to be a large nasty change)
Are there any alternative possibilities?

BTW, currently we have at least one place where a memory allocation of this kind
is done stealthily (and thus dangerously?).  ACPI resume code must execute
AcpiLeaveSleepStatePrep with interrupts disabled and ACPICA code performs memory
allocations in that code path.  Since the interrupts are disabled by means of
intr_disable(), witness(9) and similar are completely oblivious of the fact.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: memory allocation in spinlock context

2013-03-01 Thread Matthew Jacob

On 3/1/2013 5:50 AM, Andriy Gapon wrote:

I am trying to understand if it is possible to allow memory allocations 
(M_NOWAIT,
of course) in a spinlock context.

There are mechanisms to do just this- essentially by creating private 
pools that are organized in a way to allow for spinlock (and thus 
possible interrupt level) safe allocations (and failure if the pool is 
empty). Are you trying to make a general mechanism for this?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: memory allocation in spinlock context

2013-03-01 Thread Davide Italiano
On Fri, Mar 1, 2013 at 2:50 PM, Andriy Gapon a...@freebsd.org wrote:

 I am trying to understand if it is possible to allow memory allocations 
 (M_NOWAIT,
 of course) in a spinlock context.
 I do not see any obvious architectural obstacles.
 But the fact that all of the uma locks, system map lock, object locks, page 
 queue
 locks and so on are regular mutexes makes it impossible to allocate memory 
 without
 violating the fundamental lock ordering rules.

 Could all of the above mentioned locks potentially be converted to spin 
 mutexes?
 (And that seems to be a large nasty change)

AFAIK they're suitable for particular uses and not in general.
For example if the critical section is short, spinning rather than
sleeping could avoid a potential context switches, increasing
performances. OTOH has the disadvantage of wasting time that could be
used in other activities. So, IMHO, if such a change need to be
adopted, ti should be pondered/profiled more than a bit, and I doubt
it could be used for the wide class of locks you mentioned.

 Are there any alternative possibilities?


Is there anything that prevent you to drop the lock, perform the
allocation in a reliable fashion (M_WAITOK) and try to reacquire the
lock later on?

 BTW, currently we have at least one place where a memory allocation of this 
 kind
 is done stealthily (and thus dangerously?).  ACPI resume code must execute
 AcpiLeaveSleepStatePrep with interrupts disabled and ACPICA code performs 
 memory
 allocations in that code path.  Since the interrupts are disabled by means of
 intr_disable(), witness(9) and similar are completely oblivious of the fact.

 --
 Andriy Gapon
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Thanks,

-- 
Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: memory allocation in spinlock context

2013-03-01 Thread Andriy Gapon
on 01/03/2013 16:22 Matthew Jacob said the following:
 On 3/1/2013 5:50 AM, Andriy Gapon wrote:
 I am trying to understand if it is possible to allow memory allocations 
 (M_NOWAIT,
 of course) in a spinlock context.

 There are mechanisms to do just this- essentially by creating private pools 
 that
 are organized in a way to allow for spinlock (and thus possible interrupt 
 level)
 safe allocations (and failure if the pool is empty). Are you trying to make a
 general mechanism for this?

I am just pondering what would it take to develop such a general mechanism.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: looking for someone to fix humanize_number (test cases included)

2013-03-01 Thread John-Mark Gurney
Clifton Royston wrote this message on Tue, Dec 25, 2012 at 09:46 -1000:
 On Tue, Dec 25, 2012 at 08:23:55AM -1000, Clifton Royston wrote:
  On Tue, Dec 25, 2012 at 07:20:37AM -1000, Clifton Royston wrote:
   On Mon, Dec 24, 2012 at 12:00:01PM +, 
   freebsd-hackers-requ...@freebsd.org wrote:
From: John-Mark Gurney j...@funkthat.com
To: hack...@freebsd.org
Subject: looking for someone to fix humanize_number (test cases
included)

I'm looking for a person who is interested in fixing up humanize_number.
  ...
So I decided to write a test program to test the output, and now I'm 
even
more surprised by the output...  Neither 7.2-R nor 10-current give what
I expect are the correct results...
  ...
  
I am bemused.
 
   I correct myself: the function works fine, and there are no bugs I
 could find, though it's clear the man page could emphasize the correct
 usage a bit more.
 
   I had to read the source several times and start on debugging it
 before I understood the correct usage of the flag values with the scale
 and flags parameters, despite the man page stating:
 
  The following flags may be passed in scale:
 
HN_AUTOSCALE Format the buffer using the lowest multiplier pos-
 sible.
HN_GETSCALE  Return the prefix index number (the number of
 times number must be divided to fit) instead of
 formatting it to the buffer.
 
  The following flags may be passed in flags:
 
HN_DECIMAL   If the final result is less than 10, display it
 using one digit.
 ...
HN_DIVISOR_1000  Divide number with 1000 instead of 1024.
 
   That is, certain flags must be passed in flags and others must only
 be passed in scale - a bit counter-intuitive.  Also, scale == 0 is
 clearly not interpreted as AUTOSCALE, but I am not yet clear how it is
 being handled - it seems somewhat like AUTOSCALE but not identical.
 
   When the test program constant table is updated to pass the scale
 flags as specified, as well as fixing the bugs mentioned in the
 previous emails, it all passes except for the one (intentional?)
 inconsistency that k is used in place of K if HN_DECIMAL is
 enabled.
 
   The bug in the transfer speed results which prompted this inquiry
 suggests that perhaps some clients of humanize_number in the codebase
 are also passing the scale parameters incorrectly.  I would propose
 accepting HN_AUTOSCALE and HN_GETSCALE in the flags field (they don't
 overlap with other values) while continuing to accept them in the scale
 field for backwards compatibility.  Trivial diff below.

Sorry I didn't get back to this, but now I have a few minutes...

 + getscale  = (flags | scale)  HN_GETSCALE;

This isn't good:
#define HN_IEC_PREFIXES 0x10
#define HN_GETSCALE 0x10

If someone sets HN_IEC_PREFIXES, they'll acidentally enable _GETSCALE..

We could do something anoying by changing the value of _GETSCALE, and
then leaving some legacy code to accept the old _GETSCALE on the scale
input...  This would let new code work, but would break new code on
old libraries...  So, I don't see an easy way to fix this...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: memory allocation in spinlock context

2013-03-01 Thread Alfred Perlstein

On 3/1/13 5:50 AM, Andriy Gapon wrote:

I am trying to understand if it is possible to allow memory allocations 
(M_NOWAIT,
of course) in a spinlock context.
I do not see any obvious architectural obstacles.
But the fact that all of the uma locks, system map lock, object locks, page 
queue
locks and so on are regular mutexes makes it impossible to allocate memory 
without
violating the fundamental lock ordering rules.

Could all of the above mentioned locks potentially be converted to spin mutexes?
(And that seems to be a large nasty change)
Are there any alternative possibilities?

BTW, currently we have at least one place where a memory allocation of this kind
is done stealthily (and thus dangerously?).  ACPI resume code must execute
AcpiLeaveSleepStatePrep with interrupts disabled and ACPICA code performs memory
allocations in that code path.  Since the interrupts are disabled by means of
intr_disable(), witness(9) and similar are completely oblivious of the fact.

Typically the need for such a facility means that the locks are being 
held for too long.


I think someone has suggested using a private allocator carving out of a 
pre-allocated space.


Depending on the subsystem you are allocating for this may work for you.

I am looking to do this for the kernel gzip routines so that we can do 
compressed kernel dumps as soon as I verify the bounds of the gzip 
allocations.


-Alfred
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [patch] statfs does not detect -t nullfs -o union as a union mount

2013-03-01 Thread Jilles Tjoelker
On Thu, Feb 28, 2013 at 12:21:44AM +0200, Konstantin Belousov wrote:
 On Wed, Feb 27, 2013 at 10:31:42PM +0100, Jilles Tjoelker wrote:
  While testing recent changes to opendir(), I noticed that fstatfs() does
  not return the MNT_UNION flag for a -t nullfs -o union mount. As a
  result, opendir()/readdir() return files that exist in both top and
  bottom directories twice (at least . and ..). Other -o union mounts and
  -t unionfs mounts work correctly in this regard.

  The below patch passes through just the MNT_UNION flag of the nullfs
  mount itself. Perhaps more flags should be passed through.

  commit fce32a779af4eb977c9b96feb6e4f811d89f2881
  Author: Jilles Tjoelker jil...@stack.nl
  Date:   Sat Feb 23 22:22:39 2013 +0100
  
  nullfs: Preserve the MNT_UNION flag of the nullfs mount itself.
  
  This is needed so that opendir() can properly detect a union mount like
  mount -t nullfs -o union dir1 dir2.
  
  diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
  index 3724e0a..ff06f57 100644
  --- a/sys/fs/nullfs/null_vfsops.c
  +++ b/sys/fs/nullfs/null_vfsops.c
  @@ -313,7 +313,7 @@ nullfs_statfs(mp, sbp)
   
  /* now copy across the interesting information and fake the rest */
  sbp-f_type = mstat.f_type;
  -   sbp-f_flags = mstat.f_flags;
  +   sbp-f_flags = (sbp-f_flags  MNT_UNION) | mstat.f_flags;
  sbp-f_bsize = mstat.f_bsize;
  sbp-f_iosize = mstat.f_iosize;
  sbp-f_blocks = mstat.f_blocks;

 Would it make sense to preserve more flags from the upper mount ?
 I see a use for MNT_NOEXEC as well, at least.

Yes, preserving MNT_NOEXEC will make -t nullfs -o noexec work better, in
particular rtld's check for libraries loaded via environment variables.

In the same way MNT_RDONLY, MNT_NOSUID and MNT_NOSYMFOLLOW could be
preserved.

On the other hand, MNT_ROOTFS should probably not be passed through from
the underlying filesystem.

This would give
sbp-f_flags = (sbp-f_flags  (MNT_RDONLY | MNT_NOEXEC | MNT_NOSUID |
MNT_UNION | MNT_NOSYMFOLLOW) | (mstat.f_flags  ~MNT_ROOTFS);

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [patch] statfs does not detect -t nullfs -o union as a union mount

2013-03-01 Thread Konstantin Belousov
On Sat, Mar 02, 2013 at 01:00:33AM +0100, Jilles Tjoelker wrote:
 On Thu, Feb 28, 2013 at 12:21:44AM +0200, Konstantin Belousov wrote:
  On Wed, Feb 27, 2013 at 10:31:42PM +0100, Jilles Tjoelker wrote:
   While testing recent changes to opendir(), I noticed that fstatfs() does
   not return the MNT_UNION flag for a -t nullfs -o union mount. As a
   result, opendir()/readdir() return files that exist in both top and
   bottom directories twice (at least . and ..). Other -o union mounts and
   -t unionfs mounts work correctly in this regard.
 
   The below patch passes through just the MNT_UNION flag of the nullfs
   mount itself. Perhaps more flags should be passed through.
 
   commit fce32a779af4eb977c9b96feb6e4f811d89f2881
   Author: Jilles Tjoelker jil...@stack.nl
   Date:   Sat Feb 23 22:22:39 2013 +0100
   
   nullfs: Preserve the MNT_UNION flag of the nullfs mount itself.
   
   This is needed so that opendir() can properly detect a union mount 
   like
   mount -t nullfs -o union dir1 dir2.
   
   diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
   index 3724e0a..ff06f57 100644
   --- a/sys/fs/nullfs/null_vfsops.c
   +++ b/sys/fs/nullfs/null_vfsops.c
   @@ -313,7 +313,7 @@ nullfs_statfs(mp, sbp)

 /* now copy across the interesting information and fake the rest */
 sbp-f_type = mstat.f_type;
   - sbp-f_flags = mstat.f_flags;
   + sbp-f_flags = (sbp-f_flags  MNT_UNION) | mstat.f_flags;
 sbp-f_bsize = mstat.f_bsize;
 sbp-f_iosize = mstat.f_iosize;
 sbp-f_blocks = mstat.f_blocks;
 
  Would it make sense to preserve more flags from the upper mount ?
  I see a use for MNT_NOEXEC as well, at least.
 
 Yes, preserving MNT_NOEXEC will make -t nullfs -o noexec work better, in
 particular rtld's check for libraries loaded via environment variables.
 
 In the same way MNT_RDONLY, MNT_NOSUID and MNT_NOSYMFOLLOW could be
 preserved.
 
 On the other hand, MNT_ROOTFS should probably not be passed through from
 the underlying filesystem.
 
 This would give
 sbp-f_flags = (sbp-f_flags  (MNT_RDONLY | MNT_NOEXEC | MNT_NOSUID |
 MNT_UNION | MNT_NOSYMFOLLOW) | (mstat.f_flags  ~MNT_ROOTFS);
I think this is fine.


pgpbxLGJDTdz9.pgp
Description: PGP signature