Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-10-01 Thread Dave Hansen
On Mon, 2007-09-24 at 12:42 -0700, Andrew Morton wrote: 
> On Mon, 24 Sep 2007 12:28:11 -0700
> Dave Hansen <[EMAIL PROTECTED]> wrote:
> > On Mon, 2007-09-24 at 18:54 +0100, Christoph Hellwig wrote:
> > > As we already say in various messages the percpu counters in here
> > > look rather fishy.  I'd recomment to take a look at the per-cpu
> > > superblock counters in XFS as they've been debugged quite well
> > > now and could probably be lifted into a generic library for this
> > > kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
> > > can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
> > > 
> > > It also handles cases like hotplug cpu nicely that this code
> > > seems to work around by always iterating over all possible cpus
> > > which might not be nice on a dual core laptop with a distro kernel
> > > that also has to support big iron.
> > 
> > I'll take a look at xfs to see what I can get out of it.
> 
> And at include/linux/percpu_counter.h, please.

The basic incompatibility with what that provides and what I need is
that percpu_counters allow some fuzziness in the numbers.  One cpu can
be summing the numbers up while another is still adding to the local
percpu counters.  That's fine for statistics, but horrible for questions
like, "can anybody write to and corrupt my FS right now?"

It could probably be modified to do what I want, but it would still have
a the "invented your own lock" problem, and would likely impact the
scalability of the existing "fuzzy" users.

We could introduce fuzzy and coherent variants of the function calls,
but that would probably introduce more code than what I have now for the
very specific mnt_writer_lock.

> > There are basically two times when you have to do this
> > for_each_possible_cpu() stuff:
> > 1. when doing a r/w->r/o transition, which is rare, and
> >certainly not a fast path
> > 2. Where the per-cpu writer count underflows.  This requires
> >a _minimum_ of 1<<16 file opens (configurable) each of which
> >is closed on a different cpu than it was opened on.  Even
> >if you were trying, I'm not sure you'd notice the overhead.
> > 
> 
> Sounds like what you're doing is more akin to the local_t-based module
> refcounting.  `grep local_ kernel/module.c'.
> 
> That code should be converted from NR_CPUS to for_each_possible_cpu()..

I think that can get converted to use the percpu_counters pretty easily.
I've coded that up, and sent it [RFC] to lkml.  Rusty will forward on
into mainline.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-10-01 Thread Dave Hansen
On Mon, 2007-09-24 at 12:42 -0700, Andrew Morton wrote: 
 On Mon, 24 Sep 2007 12:28:11 -0700
 Dave Hansen [EMAIL PROTECTED] wrote:
  On Mon, 2007-09-24 at 18:54 +0100, Christoph Hellwig wrote:
   As we already say in various messages the percpu counters in here
   look rather fishy.  I'd recomment to take a look at the per-cpu
   superblock counters in XFS as they've been debugged quite well
   now and could probably be lifted into a generic library for this
   kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
   can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
   
   It also handles cases like hotplug cpu nicely that this code
   seems to work around by always iterating over all possible cpus
   which might not be nice on a dual core laptop with a distro kernel
   that also has to support big iron.
  
  I'll take a look at xfs to see what I can get out of it.
 
 And at include/linux/percpu_counter.h, please.

The basic incompatibility with what that provides and what I need is
that percpu_counters allow some fuzziness in the numbers.  One cpu can
be summing the numbers up while another is still adding to the local
percpu counters.  That's fine for statistics, but horrible for questions
like, can anybody write to and corrupt my FS right now?

It could probably be modified to do what I want, but it would still have
a the invented your own lock problem, and would likely impact the
scalability of the existing fuzzy users.

We could introduce fuzzy and coherent variants of the function calls,
but that would probably introduce more code than what I have now for the
very specific mnt_writer_lock.

  There are basically two times when you have to do this
  for_each_possible_cpu() stuff:
  1. when doing a r/w-r/o transition, which is rare, and
 certainly not a fast path
  2. Where the per-cpu writer count underflows.  This requires
 a _minimum_ of 116 file opens (configurable) each of which
 is closed on a different cpu than it was opened on.  Even
 if you were trying, I'm not sure you'd notice the overhead.
  
 
 Sounds like what you're doing is more akin to the local_t-based module
 refcounting.  `grep local_ kernel/module.c'.
 
 That code should be converted from NR_CPUS to for_each_possible_cpu()..

I think that can get converted to use the percpu_counters pretty easily.
I've coded that up, and sent it [RFC] to lkml.  Rusty will forward on
into mainline.

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-25 Thread Dave Hansen
On Mon, 2007-09-24 at 16:15 -0700, Andrew Morton wrote:
> On Mon, 24 Sep 2007 16:05:37 -0700
> Dave Hansen <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, 2007-09-24 at 15:25 -0700, Andrew Morton wrote:
> > > hm.  I saw that warning on my 2-way.  It has CONFIG_NR_CPUS=8 so perhaps
> > > the kernel has decided that this machine can possibly have eight CPUs.
> > > 
> > > It's an old super-micro board, doesn't have ACPI.
> > 
> > Well, it's looking like we only set cpu_possible_map from data we find
> > in the MP table, which makes sense.  The only question is how your
> > system gets more than ~8 possible cpus.  Do you have the .config handy?
> 
> http://userweb.kernel.org/~akpm/config-vmm.txt

I've reproduced this.  I'll do more runtime testing with all of the
debugging, CONFIG_CPU_HOTPLUG=y, and NR_CPUS to a high value.  It should
help catch things like this in the future.  

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-25 Thread Dave Hansen
On Mon, 2007-09-24 at 16:15 -0700, Andrew Morton wrote:
 On Mon, 24 Sep 2007 16:05:37 -0700
 Dave Hansen [EMAIL PROTECTED] wrote:
 
  On Mon, 2007-09-24 at 15:25 -0700, Andrew Morton wrote:
   hm.  I saw that warning on my 2-way.  It has CONFIG_NR_CPUS=8 so perhaps
   the kernel has decided that this machine can possibly have eight CPUs.
   
   It's an old super-micro board, doesn't have ACPI.
  
  Well, it's looking like we only set cpu_possible_map from data we find
  in the MP table, which makes sense.  The only question is how your
  system gets more than ~8 possible cpus.  Do you have the .config handy?
 
 http://userweb.kernel.org/~akpm/config-vmm.txt

I've reproduced this.  I'll do more runtime testing with all of the
debugging, CONFIG_CPU_HOTPLUG=y, and NR_CPUS to a high value.  It should
help catch things like this in the future.  

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 16:05:37 -0700
Dave Hansen <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-09-24 at 15:25 -0700, Andrew Morton wrote:
> > hm.  I saw that warning on my 2-way.  It has CONFIG_NR_CPUS=8 so perhaps
> > the kernel has decided that this machine can possibly have eight CPUs.
> > 
> > It's an old super-micro board, doesn't have ACPI.
> 
> Well, it's looking like we only set cpu_possible_map from data we find
> in the MP table, which makes sense.  The only question is how your
> system gets more than ~8 possible cpus.  Do you have the .config handy?
> 

http://userweb.kernel.org/~akpm/config-vmm.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Dave Hansen
On Mon, 2007-09-24 at 15:25 -0700, Andrew Morton wrote:
> hm.  I saw that warning on my 2-way.  It has CONFIG_NR_CPUS=8 so perhaps
> the kernel has decided that this machine can possibly have eight CPUs.
> 
> It's an old super-micro board, doesn't have ACPI.

Well, it's looking like we only set cpu_possible_map from data we find
in the MP table, which makes sense.  The only question is how your
system gets more than ~8 possible cpus.  Do you have the .config handy?

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 15:06:42 -0700
Dave Hansen <[EMAIL PROTECTED]> wrote:

> On Sun, 2007-09-23 at 23:17 -0700, Andrew Morton wrote:
> > It look like a false positive to me, but really, for a patchset of this
> > complexity and maturity I cannot fathom how it could have escaped any
> > lockdep testing.
> 
> I test with lockdep all the time.  The problem was that lockdep doesn't
> complain until you have 8 nested locks, and I only tested on a 4-cpu
> system.
> 
> I lowered the lockdep nesting limit to 3, and got the warning on my
> machine.
> 

hm.  I saw that warning on my 2-way.  It has CONFIG_NR_CPUS=8 so perhaps
the kernel has decided that this machine can possibly have eight CPUs.

It's an old super-micro board, doesn't have ACPI.

OEM ID: INTELProduct ID: 440BXAPIC at: 0xFEE0
Processor #0 6:8 APIC version 17
Processor #1 6:8 APIC version 17
I/O APIC #2 Version 17 at 0xFEC0.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 2

...

Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 1707.03 BogoMIPS (lpj=3414067)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383fbff     
  
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After all inits, caps: 0383fbff   0040  
  
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
lockdep: not fixing up alternatives.
CPU0: Intel Pentium III (Coppermine) stepping 03
lockdep: not fixing up alternatives.
Booting processor 1/1 eip 2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 1704.02 BogoMIPS (lpj=3408054)
CPU: After generic identify, caps: 0383fbff     
  
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After all inits, caps: 0383fbff   0040  
  
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel Pentium III (Coppermine) stepping 03
Total of 2 processors activated (3411.06 BogoMIPS).
ExtINT not setup in hardware but reported by MP table
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0
APIC timer registered as dummy, due to nmi_watchdog=1!
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs


One would think that the kernel would work out that eight CPUs ain't
possible on such a machine.  But we don't seem to print out any info which
allows me to confirm that this is really what the kernel did.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Dave Hansen
On Sun, 2007-09-23 at 23:17 -0700, Andrew Morton wrote:
> It look like a false positive to me, but really, for a patchset of this
> complexity and maturity I cannot fathom how it could have escaped any
> lockdep testing.

I test with lockdep all the time.  The problem was that lockdep doesn't
complain until you have 8 nested locks, and I only tested on a 4-cpu
system.

I lowered the lockdep nesting limit to 3, and got the warning on my
machine.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 12:28:11 -0700
Dave Hansen <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-09-24 at 18:54 +0100, Christoph Hellwig wrote:
> > As we already say in various messages the percpu counters in here
> > look rather fishy.  I'd recomment to take a look at the per-cpu
> > superblock counters in XFS as they've been debugged quite well
> > now and could probably be lifted into a generic library for this
> > kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
> > can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
> > 
> > It also handles cases like hotplug cpu nicely that this code
> > seems to work around by always iterating over all possible cpus
> > which might not be nice on a dual core laptop with a distro kernel
> > that also has to support big iron.
> 
> I'll take a look at xfs to see what I can get out of it.

And at include/linux/percpu_counter.h, please.

> There are basically two times when you have to do this
> for_each_possible_cpu() stuff:
> 1. when doing a r/w->r/o transition, which is rare, and
>certainly not a fast path
> 2. Where the per-cpu writer count underflows.  This requires
>a _minimum_ of 1<<16 file opens (configurable) each of which
>is closed on a different cpu than it was opened on.  Even
>if you were trying, I'm not sure you'd notice the overhead.
> 

Sounds like what you're doing is more akin to the local_t-based module
refcounting.  `grep local_ kernel/module.c'.

That code should be converted from NR_CPUS to for_each_possible_cpu()..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Dave Hansen
On Mon, 2007-09-24 at 18:54 +0100, Christoph Hellwig wrote:
> As we already say in various messages the percpu counters in here
> look rather fishy.  I'd recomment to take a look at the per-cpu
> superblock counters in XFS as they've been debugged quite well
> now and could probably be lifted into a generic library for this
> kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
> can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
> 
> It also handles cases like hotplug cpu nicely that this code
> seems to work around by always iterating over all possible cpus
> which might not be nice on a dual core laptop with a distro kernel
> that also has to support big iron.

I'll take a look at xfs to see what I can get out of it.

There are basically two times when you have to do this
for_each_possible_cpu() stuff:
1. when doing a r/w->r/o transition, which is rare, and
   certainly not a fast path
2. Where the per-cpu writer count underflows.  This requires
   a _minimum_ of 1<<16 file opens (configurable) each of which
   is closed on a different cpu than it was opened on.  Even
   if you were trying, I'm not sure you'd notice the overhead.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Christoph Hellwig
On Mon, Sep 24, 2007 at 12:10:35PM -0700, Andrew Morton wrote:
> > As we already say in various messages the percpu counters in here
> > look rather fishy.  I'd recomment to take a look at the per-cpu
> > superblock counters in XFS as they've been debugged quite well
> > now and could probably be lifted into a generic library for this
> > kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
> > can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
> > 
> > It also handles cases like hotplug cpu nicely that this code
> > seems to work around by always iterating over all possible cpus
> > which might not be nice on a dual core laptop with a distro kernel
> > that also has to support big iron.
> 
> hm.  How come xfs invented a new version of percpu_counters?

This code actually predates the generic percpu_counters even if it
was merged to mainline later.  Neither Dave who wrote it nor me
who reviewed it before it was merged thought of percpu_counters
probably.  Then again this code is considerably more complex due
to features actually needed in a very hot fastpath.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 18:54:11 +0100
Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> As we already say in various messages the percpu counters in here
> look rather fishy.  I'd recomment to take a look at the per-cpu
> superblock counters in XFS as they've been debugged quite well
> now and could probably be lifted into a generic library for this
> kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
> can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
> 
> It also handles cases like hotplug cpu nicely that this code
> seems to work around by always iterating over all possible cpus
> which might not be nice on a dual core laptop with a distro kernel
> that also has to support big iron.

hm.  How come xfs invented a new version of percpu_counters?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Christoph Hellwig
As we already say in various messages the percpu counters in here
look rather fishy.  I'd recomment to take a look at the per-cpu
superblock counters in XFS as they've been debugged quite well
now and could probably be lifted into a generic library for this
kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
can be spotted by beeing under #ifdef HAVE_PERCPU_SB.

It also handles cases like hotplug cpu nicely that this code
seems to work around by always iterating over all possible cpus
which might not be nice on a dual core laptop with a distro kernel
that also has to support big iron.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Arjan van de Ven

> It look like a false positive to me, but really, for a patchset of
> this complexity and maturity I cannot fathom how it could have
> escaped any lockdep testing.

the code tries to implement per cpu spinlocks, or rather it tries
to bring back the brlocks from way past cute.

we can educate lockdep about this quite easily; but isn't there some
primitive already in existence that we can use?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Thu, 20 Sep 2007 12:53:20 -0700 Dave Hansen <[EMAIL PROTECTED]> wrote:

> This is the real meat of the entire series.  It actually
> implements the tracking of the number of writers to a mount.
> However, it causes scalability problems because there can
> be hundreds of cpus doing open()/close() on files on the
> same mnt at the same time.  Even an atomic_t in the mnt
> has massive scalaing problems because the cacheline gets
> so terribly contended.
> 
> This uses a statically-allocated percpu variable.  All
> operations are local to a cpu as long that cpu operates on
> the same mount, and there are no writer count imbalances.
> Writer count imbalances happen when a write is taken on one
> cpu, and released on another, like when an open/close pair
> is performed on two different cpus because the task moved.


Did you test with lockdep enabled?

=
[ INFO: possible recursive locking detected ]
2.6.23-rc7-mm1 #1
-
swapper/1 is trying to acquire lock:
 (>lock){--..}, at: [] 
lock_and_coalesce_cpu_mnt_writer_counts+0x32/0x70

but task is already holding lock:
 (>lock){--..}, at: [] 
lock_and_coalesce_cpu_mnt_writer_counts+0x32/0x70

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (>lock){--..}, at: [] 
lock_and_coalesce_cpu_mnt_writer_counts+0x32/0x70

stack backtrace:
 [] show_trace_log_lvl+0x1a/0x30
 [] show_trace+0x12/0x20
 [] dump_stack+0x16/0x20
 [] __lock_acquire+0xde5/0x10a0
 [] lock_acquire+0x7a/0xa0
 [] _spin_lock+0x2c/0x40
 [] lock_and_coalesce_cpu_mnt_writer_counts+0x32/0x70
 [] mntput_no_expire+0x36/0xc0
 [] path_release_on_umount+0x15/0x20
 [] sys_umount+0x40/0x230
 [] name_to_dev_t+0x9b/0x270
 [] prepare_namespace+0x62/0x1b0
 [] kernel_init+0x21a/0x320
 [] kernel_thread_helper+0x7/0x10
 ===

It look like a false positive to me, but really, for a patchset of this
complexity and maturity I cannot fathom how it could have escaped any
lockdep testing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Thu, 20 Sep 2007 12:53:20 -0700 Dave Hansen [EMAIL PROTECTED] wrote:

 This is the real meat of the entire series.  It actually
 implements the tracking of the number of writers to a mount.
 However, it causes scalability problems because there can
 be hundreds of cpus doing open()/close() on files on the
 same mnt at the same time.  Even an atomic_t in the mnt
 has massive scalaing problems because the cacheline gets
 so terribly contended.
 
 This uses a statically-allocated percpu variable.  All
 operations are local to a cpu as long that cpu operates on
 the same mount, and there are no writer count imbalances.
 Writer count imbalances happen when a write is taken on one
 cpu, and released on another, like when an open/close pair
 is performed on two different cpus because the task moved.


Did you test with lockdep enabled?

=
[ INFO: possible recursive locking detected ]
2.6.23-rc7-mm1 #1
-
swapper/1 is trying to acquire lock:
 (writer-lock){--..}, at: [c0197a32] 
lock_and_coalesce_cpu_mnt_writer_counts+0x32/0x70

but task is already holding lock:
 (writer-lock){--..}, at: [c0197a32] 
lock_and_coalesce_cpu_mnt_writer_counts+0x32/0x70

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (writer-lock){--..}, at: [c0197a32] 
lock_and_coalesce_cpu_mnt_writer_counts+0x32/0x70

stack backtrace:
 [c0103ffa] show_trace_log_lvl+0x1a/0x30
 [c0104b82] show_trace+0x12/0x20
 [c0104c96] dump_stack+0x16/0x20
 [c0144dc5] __lock_acquire+0xde5/0x10a0
 [c01450fa] lock_acquire+0x7a/0xa0
 [c03e734c] _spin_lock+0x2c/0x40
 [c0197a32] lock_and_coalesce_cpu_mnt_writer_counts+0x32/0x70
 [c01982c6] mntput_no_expire+0x36/0xc0
 [c0188f15] path_release_on_umount+0x15/0x20
 [c0198930] sys_umount+0x40/0x230
 [c010070b] name_to_dev_t+0x9b/0x270
 [c05230c2] prepare_namespace+0x62/0x1b0
 [c05226ca] kernel_init+0x21a/0x320
 [c0103b47] kernel_thread_helper+0x7/0x10
 ===

It look like a false positive to me, but really, for a patchset of this
complexity and maturity I cannot fathom how it could have escaped any
lockdep testing.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Arjan van de Ven

 It look like a false positive to me, but really, for a patchset of
 this complexity and maturity I cannot fathom how it could have
 escaped any lockdep testing.

the code tries to implement per cpu spinlocks, or rather it tries
to bring back the brlocks from way past cute.

we can educate lockdep about this quite easily; but isn't there some
primitive already in existence that we can use?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Christoph Hellwig
As we already say in various messages the percpu counters in here
look rather fishy.  I'd recomment to take a look at the per-cpu
superblock counters in XFS as they've been debugged quite well
now and could probably be lifted into a generic library for this
kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
can be spotted by beeing under #ifdef HAVE_PERCPU_SB.

It also handles cases like hotplug cpu nicely that this code
seems to work around by always iterating over all possible cpus
which might not be nice on a dual core laptop with a distro kernel
that also has to support big iron.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 18:54:11 +0100
Christoph Hellwig [EMAIL PROTECTED] wrote:

 As we already say in various messages the percpu counters in here
 look rather fishy.  I'd recomment to take a look at the per-cpu
 superblock counters in XFS as they've been debugged quite well
 now and could probably be lifted into a generic library for this
 kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
 can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
 
 It also handles cases like hotplug cpu nicely that this code
 seems to work around by always iterating over all possible cpus
 which might not be nice on a dual core laptop with a distro kernel
 that also has to support big iron.

hm.  How come xfs invented a new version of percpu_counters?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Christoph Hellwig
On Mon, Sep 24, 2007 at 12:10:35PM -0700, Andrew Morton wrote:
  As we already say in various messages the percpu counters in here
  look rather fishy.  I'd recomment to take a look at the per-cpu
  superblock counters in XFS as they've been debugged quite well
  now and could probably be lifted into a generic library for this
  kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
  can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
  
  It also handles cases like hotplug cpu nicely that this code
  seems to work around by always iterating over all possible cpus
  which might not be nice on a dual core laptop with a distro kernel
  that also has to support big iron.
 
 hm.  How come xfs invented a new version of percpu_counters?

This code actually predates the generic percpu_counters even if it
was merged to mainline later.  Neither Dave who wrote it nor me
who reviewed it before it was merged thought of percpu_counters
probably.  Then again this code is considerably more complex due
to features actually needed in a very hot fastpath.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 12:28:11 -0700
Dave Hansen [EMAIL PROTECTED] wrote:

 On Mon, 2007-09-24 at 18:54 +0100, Christoph Hellwig wrote:
  As we already say in various messages the percpu counters in here
  look rather fishy.  I'd recomment to take a look at the per-cpu
  superblock counters in XFS as they've been debugged quite well
  now and could probably be lifted into a generic library for this
  kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
  can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
  
  It also handles cases like hotplug cpu nicely that this code
  seems to work around by always iterating over all possible cpus
  which might not be nice on a dual core laptop with a distro kernel
  that also has to support big iron.
 
 I'll take a look at xfs to see what I can get out of it.

And at include/linux/percpu_counter.h, please.

 There are basically two times when you have to do this
 for_each_possible_cpu() stuff:
 1. when doing a r/w-r/o transition, which is rare, and
certainly not a fast path
 2. Where the per-cpu writer count underflows.  This requires
a _minimum_ of 116 file opens (configurable) each of which
is closed on a different cpu than it was opened on.  Even
if you were trying, I'm not sure you'd notice the overhead.
 

Sounds like what you're doing is more akin to the local_t-based module
refcounting.  `grep local_ kernel/module.c'.

That code should be converted from NR_CPUS to for_each_possible_cpu()..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Dave Hansen
On Mon, 2007-09-24 at 18:54 +0100, Christoph Hellwig wrote:
 As we already say in various messages the percpu counters in here
 look rather fishy.  I'd recomment to take a look at the per-cpu
 superblock counters in XFS as they've been debugged quite well
 now and could probably be lifted into a generic library for this
 kind of think.  The code is mostly in fs/xfs/xfs_mount.c can
 can be spotted by beeing under #ifdef HAVE_PERCPU_SB.
 
 It also handles cases like hotplug cpu nicely that this code
 seems to work around by always iterating over all possible cpus
 which might not be nice on a dual core laptop with a distro kernel
 that also has to support big iron.

I'll take a look at xfs to see what I can get out of it.

There are basically two times when you have to do this
for_each_possible_cpu() stuff:
1. when doing a r/w-r/o transition, which is rare, and
   certainly not a fast path
2. Where the per-cpu writer count underflows.  This requires
   a _minimum_ of 116 file opens (configurable) each of which
   is closed on a different cpu than it was opened on.  Even
   if you were trying, I'm not sure you'd notice the overhead.

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Dave Hansen
On Sun, 2007-09-23 at 23:17 -0700, Andrew Morton wrote:
 It look like a false positive to me, but really, for a patchset of this
 complexity and maturity I cannot fathom how it could have escaped any
 lockdep testing.

I test with lockdep all the time.  The problem was that lockdep doesn't
complain until you have 8 nested locks, and I only tested on a 4-cpu
system.

I lowered the lockdep nesting limit to 3, and got the warning on my
machine.

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 15:06:42 -0700
Dave Hansen [EMAIL PROTECTED] wrote:

 On Sun, 2007-09-23 at 23:17 -0700, Andrew Morton wrote:
  It look like a false positive to me, but really, for a patchset of this
  complexity and maturity I cannot fathom how it could have escaped any
  lockdep testing.
 
 I test with lockdep all the time.  The problem was that lockdep doesn't
 complain until you have 8 nested locks, and I only tested on a 4-cpu
 system.
 
 I lowered the lockdep nesting limit to 3, and got the warning on my
 machine.
 

hm.  I saw that warning on my 2-way.  It has CONFIG_NR_CPUS=8 so perhaps
the kernel has decided that this machine can possibly have eight CPUs.

It's an old super-micro board, doesn't have ACPI.

OEM ID: INTELProduct ID: 440BXAPIC at: 0xFEE0
Processor #0 6:8 APIC version 17
Processor #1 6:8 APIC version 17
I/O APIC #2 Version 17 at 0xFEC0.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 2

...

Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 1707.03 BogoMIPS (lpj=3414067)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383fbff     
  
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After all inits, caps: 0383fbff   0040  
  
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
lockdep: not fixing up alternatives.
CPU0: Intel Pentium III (Coppermine) stepping 03
lockdep: not fixing up alternatives.
Booting processor 1/1 eip 2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 1704.02 BogoMIPS (lpj=3408054)
CPU: After generic identify, caps: 0383fbff     
  
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After all inits, caps: 0383fbff   0040  
  
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel Pentium III (Coppermine) stepping 03
Total of 2 processors activated (3411.06 BogoMIPS).
ExtINT not setup in hardware but reported by MP table
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0
APIC timer registered as dummy, due to nmi_watchdog=1!
checking TSC synchronization [CPU#0 - CPU#1]: passed.
Brought up 2 CPUs


One would think that the kernel would work out that eight CPUs ain't
possible on such a machine.  But we don't seem to print out any info which
allows me to confirm that this is really what the kernel did.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Dave Hansen
On Mon, 2007-09-24 at 15:25 -0700, Andrew Morton wrote:
 hm.  I saw that warning on my 2-way.  It has CONFIG_NR_CPUS=8 so perhaps
 the kernel has decided that this machine can possibly have eight CPUs.
 
 It's an old super-micro board, doesn't have ACPI.

Well, it's looking like we only set cpu_possible_map from data we find
in the MP table, which makes sense.  The only question is how your
system gets more than ~8 possible cpus.  Do you have the .config handy?

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-24 Thread Andrew Morton
On Mon, 24 Sep 2007 16:05:37 -0700
Dave Hansen [EMAIL PROTECTED] wrote:

 On Mon, 2007-09-24 at 15:25 -0700, Andrew Morton wrote:
  hm.  I saw that warning on my 2-way.  It has CONFIG_NR_CPUS=8 so perhaps
  the kernel has decided that this machine can possibly have eight CPUs.
  
  It's an old super-micro board, doesn't have ACPI.
 
 Well, it's looking like we only set cpu_possible_map from data we find
 in the MP table, which makes sense.  The only question is how your
 system gets more than ~8 possible cpus.  Do you have the .config handy?
 

http://userweb.kernel.org/~akpm/config-vmm.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-20 Thread Dave Hansen

If we can't pull just this patch in for now, it would be
great to get everything leading up to here pulled in.  I've
re-implemented this several ways, and it has never caused
the preceeding patches to change at all.

--

This is the real meat of the entire series.  It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can
be hundreds of cpus doing open()/close() on files on the
same mnt at the same time.  Even an atomic_t in the mnt
has massive scalaing problems because the cacheline gets
so terribly contended.

This uses a statically-allocated percpu variable.  All
operations are local to a cpu as long that cpu operates on
the same mount, and there are no writer count imbalances.
Writer count imbalances happen when a write is taken on one
cpu, and released on another, like when an open/close pair
is performed on two different cpus because the task moved.

Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we
determine if there are any outstanding writers to the mount.

I've written a little benchmark to sit in a loop for a couple
of seconds in several cpus in parallel doing open/write/close
loops.

http://sr71.net/~dave/linux/openbench.c

The code in here is a a worst-possible case for this patch.
It does opens on a _pair_ of files in two different mounts
in parallel.  This should cause my code to lose its "operate
on the same mount" optimization completely.  This worst-case
scenario causes a 3% degredation in the benchmark.

I could probably get rid of even this 3%, but it would be
more complex than what I have here, and I think this is
getting into acceptable territory.  In practice, I expect
writing more than 3 bytes to a file, as well as disk I/O
to mask any effects that this has.

(To get rid of that 3%, we could have an #defined number
 of mounts in the percpu variable.  So, instead of a CPU
 getting operate only on percpu data when it accesses
 only one mount, it could stay on percpu data when it
 only accesses N or fewer mounts.)

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/namespace.c|  205 ++---
 lxc-dave/include/linux/mount.h |8 +
 2 files changed, 198 insertions(+), 15 deletions(-)

diff -puN fs/namespace.c~numa_mnt_want_write fs/namespace.c
--- lxc/fs/namespace.c~numa_mnt_want_write  2007-09-20 12:16:23.0 
-0700
+++ lxc-dave/fs/namespace.c 2007-09-20 12:16:23.0 -0700
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -52,6 +53,8 @@ static inline unsigned long hash(struct 
return tmp & hash_mask;
 }
 
+#define MNT_WRITER_UNDERFLOW_LIMIT -(1<<16)
+
 struct vfsmount *alloc_vfsmnt(const char *name)
 {
struct vfsmount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
@@ -65,6 +68,7 @@ struct vfsmount *alloc_vfsmnt(const char
INIT_LIST_HEAD(>mnt_share);
INIT_LIST_HEAD(>mnt_slave_list);
INIT_LIST_HEAD(>mnt_slave);
+   atomic_set(>__mnt_writers, 0);
if (name) {
int size = strlen(name) + 1;
char *newname = kmalloc(size, GFP_KERNEL);
@@ -85,6 +89,84 @@ struct vfsmount *alloc_vfsmnt(const char
  * we can determine when writes are able to occur to
  * a filesystem.
  */
+/*
+ * __mnt_is_readonly: check whether a mount is read-only
+ * @mnt: the mount to check for its write status
+ *
+ * This shouldn't be used directly ouside of the VFS.
+ * It does not guarantee that the filesystem will stay
+ * r/w, just that it is right *now*.  This can not and
+ * should not be used in place of IS_RDONLY(inode).
+ * mnt_want/drop_write() will _keep_ the filesystem
+ * r/w.
+ */
+int __mnt_is_readonly(struct vfsmount *mnt)
+{
+   return (mnt->mnt_sb->s_flags & MS_RDONLY);
+}
+EXPORT_SYMBOL_GPL(__mnt_is_readonly);
+
+struct mnt_writer {
+   /*
+* If holding multiple instances of this lock, they
+* must be ordered by cpu number.
+*/
+   spinlock_t lock;
+   unsigned long count;
+   struct vfsmount *mnt;
+} cacheline_aligned_in_smp;
+static DEFINE_PER_CPU(struct mnt_writer, mnt_writers);
+
+static int __init init_mnt_writers(void)
+{
+   int cpu;
+   for_each_possible_cpu(cpu) {
+   struct mnt_writer *writer = _cpu(mnt_writers, cpu);
+   spin_lock_init(>lock);
+   writer->count = 0;
+   }
+   return 0;
+}
+fs_initcall(init_mnt_writers);
+
+static void mnt_unlock_cpus(void)
+{
+   int cpu;
+   struct mnt_writer *cpu_writer;
+
+   for_each_possible_cpu(cpu) {
+   cpu_writer = _cpu(mnt_writers, cpu);
+   spin_unlock(_writer->lock);
+   }
+}
+
+static inline void __clear_mnt_count(struct mnt_writer *cpu_writer)
+{
+   if (!cpu_writer->mnt)
+   return;
+   

[PATCH 24/25] r/o bind mounts: track number of mount writers

2007-09-20 Thread Dave Hansen

If we can't pull just this patch in for now, it would be
great to get everything leading up to here pulled in.  I've
re-implemented this several ways, and it has never caused
the preceeding patches to change at all.

--

This is the real meat of the entire series.  It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can
be hundreds of cpus doing open()/close() on files on the
same mnt at the same time.  Even an atomic_t in the mnt
has massive scalaing problems because the cacheline gets
so terribly contended.

This uses a statically-allocated percpu variable.  All
operations are local to a cpu as long that cpu operates on
the same mount, and there are no writer count imbalances.
Writer count imbalances happen when a write is taken on one
cpu, and released on another, like when an open/close pair
is performed on two different cpus because the task moved.

Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we
determine if there are any outstanding writers to the mount.

I've written a little benchmark to sit in a loop for a couple
of seconds in several cpus in parallel doing open/write/close
loops.

http://sr71.net/~dave/linux/openbench.c

The code in here is a a worst-possible case for this patch.
It does opens on a _pair_ of files in two different mounts
in parallel.  This should cause my code to lose its operate
on the same mount optimization completely.  This worst-case
scenario causes a 3% degredation in the benchmark.

I could probably get rid of even this 3%, but it would be
more complex than what I have here, and I think this is
getting into acceptable territory.  In practice, I expect
writing more than 3 bytes to a file, as well as disk I/O
to mask any effects that this has.

(To get rid of that 3%, we could have an #defined number
 of mounts in the percpu variable.  So, instead of a CPU
 getting operate only on percpu data when it accesses
 only one mount, it could stay on percpu data when it
 only accesses N or fewer mounts.)

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namespace.c|  205 ++---
 lxc-dave/include/linux/mount.h |8 +
 2 files changed, 198 insertions(+), 15 deletions(-)

diff -puN fs/namespace.c~numa_mnt_want_write fs/namespace.c
--- lxc/fs/namespace.c~numa_mnt_want_write  2007-09-20 12:16:23.0 
-0700
+++ lxc-dave/fs/namespace.c 2007-09-20 12:16:23.0 -0700
@@ -17,6 +17,7 @@
 #include linux/quotaops.h
 #include linux/acct.h
 #include linux/capability.h
+#include linux/cpumask.h
 #include linux/module.h
 #include linux/sysfs.h
 #include linux/seq_file.h
@@ -52,6 +53,8 @@ static inline unsigned long hash(struct 
return tmp  hash_mask;
 }
 
+#define MNT_WRITER_UNDERFLOW_LIMIT -(116)
+
 struct vfsmount *alloc_vfsmnt(const char *name)
 {
struct vfsmount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
@@ -65,6 +68,7 @@ struct vfsmount *alloc_vfsmnt(const char
INIT_LIST_HEAD(mnt-mnt_share);
INIT_LIST_HEAD(mnt-mnt_slave_list);
INIT_LIST_HEAD(mnt-mnt_slave);
+   atomic_set(mnt-__mnt_writers, 0);
if (name) {
int size = strlen(name) + 1;
char *newname = kmalloc(size, GFP_KERNEL);
@@ -85,6 +89,84 @@ struct vfsmount *alloc_vfsmnt(const char
  * we can determine when writes are able to occur to
  * a filesystem.
  */
+/*
+ * __mnt_is_readonly: check whether a mount is read-only
+ * @mnt: the mount to check for its write status
+ *
+ * This shouldn't be used directly ouside of the VFS.
+ * It does not guarantee that the filesystem will stay
+ * r/w, just that it is right *now*.  This can not and
+ * should not be used in place of IS_RDONLY(inode).
+ * mnt_want/drop_write() will _keep_ the filesystem
+ * r/w.
+ */
+int __mnt_is_readonly(struct vfsmount *mnt)
+{
+   return (mnt-mnt_sb-s_flags  MS_RDONLY);
+}
+EXPORT_SYMBOL_GPL(__mnt_is_readonly);
+
+struct mnt_writer {
+   /*
+* If holding multiple instances of this lock, they
+* must be ordered by cpu number.
+*/
+   spinlock_t lock;
+   unsigned long count;
+   struct vfsmount *mnt;
+} cacheline_aligned_in_smp;
+static DEFINE_PER_CPU(struct mnt_writer, mnt_writers);
+
+static int __init init_mnt_writers(void)
+{
+   int cpu;
+   for_each_possible_cpu(cpu) {
+   struct mnt_writer *writer = per_cpu(mnt_writers, cpu);
+   spin_lock_init(writer-lock);
+   writer-count = 0;
+   }
+   return 0;
+}
+fs_initcall(init_mnt_writers);
+
+static void mnt_unlock_cpus(void)
+{
+   int cpu;
+   struct mnt_writer *cpu_writer;
+
+   for_each_possible_cpu(cpu) {
+   cpu_writer = per_cpu(mnt_writers, cpu);
+   spin_unlock(cpu_writer-lock);
+   }
+}
+
+static inline void