Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-31 Thread Brandon Philips
On 09:36 Tue 30 Mar 2010, Rusty Russell wrote:
 The real fix here is to drop the lock, like Brandon suggested, but
 we need to do it more carefully: when we re-acquire the lock we need
 to re-lookup the symbol in case the module has vanished or changed.
 
 Brandon, I can't see how libcrc32c's module_init calls
 crypto_alloc_shash, but the problem is reproducible with simple
 example modules.  Does it fix your problem?

Did you see my email yesterday explaining how this came about? Does
that answer your question?

http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg04363.html

 module: drop the lock while waiting for module to complete initialization.
 
 This fixes gave up waiting for init of module libcrc32c. which
 happened at boot time due to multiple parallel module loads.

Since the root cause is known a bit more detail can be added:

This fixes gave up waiting for init of module libcrc32c. which is
triggered by parallel modprobes by udev for the same driver. For
example on a machine with two network interfaces:

02:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)
02:00.1 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)

udev will fork off two modprobe processes:

/sbin/modprobe pci:v14E4d1639sv1014sd037Cbc02sc00i00
/sbin/modprobe pci:v14E4d1639sv1014sd037Cbc02sc00i00

 The problem was a deadlock: we wait for a module to finish
 initializing, but we keep the module_lock mutex so it can't
 complete.  In particular, this could reasonably happen if libcrc32c
 does a request_module() in its initialization routine.
 
 So we change use_module() to return an errno rather than a bool, and if
 it's -EBUSY we drop the lock and wait in the caller, then reaquire the
 lock.
 
 Reported-by: Brandon Philips bran...@ifup.org
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

Reviewed the patch and it looks good and I tested it on the machine
and it works. A couple of trivial things inline below if you care.

Signed-off-by: Brandon Philips bran...@ifup.org
Cc: sta...@kernel.org

Thanks Rusty.

Cheers,

Brandon


 - if (b == NULL || already_uses(a, b)) return 1;
 + if (b == NULL || already_uses(a, b)) return 0;

if (b == NULL || already_uses(a, b))
return 0;

   /* If we're interrupted or time out, we fail. */
 - if (wait_event_interruptible_timeout(
 - module_wq, (err = strong_try_module_get(b)) != -EBUSY,
 - 30 * HZ) = 0) {
 - printk(%s: gave up waiting for init of module %s.\n,
 -a-name, b-name);
 - return 0;
 - }
 -
 - /* If strong_try_module_get() returned a different error, we fail. */
 + err = strong_try_module_get(b);
   if (err)
 - return 0;
 + return err;
  
   DEBUGP(Allocating new usage for %s.\n, a-name);
   use = kmalloc(sizeof(*use), GFP_ATOMIC);
   if (!use) {
   printk(%s: out of memory loading\n, a-name);

printk(KERN_ERR %s: out of memory loading\n, a-name);

   module_put(b);
 - return 0;
 + return -ENOMEM;
   }
  
   use-module_which_uses = a;
   list_add(use-list, b-modules_which_use_me);
   no_warn = sysfs_create_link(b-holders_dir, a-mkobj.kobj, a-name);
 - return 1;
 + return 0;
  }
  EXPORT_SYMBOL_GPL(use_module);
  
 @@ -823,7 +815,7 @@ static inline void module_unload_free(st
  
  int use_module(struct module *a, struct module *b)
  {
 - return strong_try_module_get(b) == 0;
 + return strong_try_module_get(b);
  }
  EXPORT_SYMBOL_GPL(use_module);
  
 @@ -994,17 +986,39 @@ static const struct kernel_symbol *resol
   struct module *owner;
   const struct kernel_symbol *sym;
   const unsigned long *crc;
 + DEFINE_WAIT(wait);
 + int err;
 + long timeleft = 30 * HZ;
  
 +again:
   sym = find_symbol(name, owner, crc,
 !(mod-taints  (1  TAINT_PROPRIETARY_MODULE)), 
 true);
 - /* use_module can fail due to OOM,
 -or module initialization or unloading */
 - if (sym) {
 - if (!check_version(sechdrs, versindex, name, mod, crc, owner)
 - || !use_module(mod, owner))
 - sym = NULL;
 + if (!sym)
 + return NULL;
 +
 + if (!check_version(sechdrs, versindex, name, mod, crc, owner))
 + return NULL;
 +
 + prepare_to_wait(module_wq, wait, TASK_INTERRUPTIBLE);
 + err = use_module(mod, owner);
 + if (likely(!err) || err != -EBUSY || signal_pending(current)) {
 + finish_wait(module_wq, wait);
 + return err ? NULL : sym;
   }
 - return sym;
 +
 + /* Module is still loading.  Drop lock and wait. */
 + mutex_unlock(module_mutex);
 + timeleft = schedule_timeout(timeleft);
 + mutex_lock

Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-30 Thread Brandon Philips
On 20:29 Sat 20 Mar 2010, Herbert Xu wrote:
 On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
 
  I hear what you're saying Herbert, but thinking about this a bit I
  really think we should make this situation work instead of fail.
 
 I think the initial report perhaps painted this in a slight
 different fashion than what it really is.  The code that was
 looping in module.c is not trying to load libcrc32c, but rather
 it is trying to get a reference on the already-loaded libcrc32c
 module.

You are correct. Sorry for the confusion.

 AFAICS the only way to make it work would be to reload the
 module in question when we can't get a reference on it.  But
 that would entail recursively loading a module during the process
 of loading another module.
 
 Rusty can chime in on whether this is doable.
 
 I think I have a good guess as to why this problem is occuring
 for Brandon.  It is probably the result of two near-simultaneous
 modprobes, one issued against libcrc32c and one against bnx2x.
 
 So if Suse were doing some kind of parallel booting where multiple
 modules may be loaded together then this could occur.

It is actually due to two nearly simultaneous modprobes of bnx2x from
udev since there are two devices:

02:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)
02:00.1 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)

Output from my modprobe strace wrappers:

Process 4589 (first to get to libcrc32c)


4589  execve(/sbin/modprobe.bin, [/sbin/modprobe.bin, 
pci:v14E4d1639sv1014...], [/* 14 vars */]) = 0

...

4589  open(/lib/modules/2.6.32.9-debug/kernel/lib/libcrc32c.ko, O_RDONLY) = 3
4589  stat(/sys/module, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4589  stat(/sys/module/libcrc32c, 0x7fff3d03bef0) = -1 ENOENT (No such file 
or directory)
4589  read(3, \177E, 2)   = 2
4589  lseek(3, 0, SEEK_SET) = 0
4589  fstat(3, {st_mode=S_IFREG|0644, st_size=90482, ...}) = 0
4589  mmap(NULL, 90482, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 
0x7f7963d49000
4589  init_module(0x7f7963d49000, 90482, ) = 0
4589  munmap(0x7f7963d49000, 90482) = 0
4589  close(3)  = 0
4589  open(/lib/modules/2.6.32.9-debug/kernel/drivers/net/bnx2.ko, O_RDONLY) 
= 3
4589  stat(/sys/module, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4589  stat(/sys/module/bnx2, 0x7fff3d03bf90) = -1 ENOENT (No such file or 
directory)
4589  read(3, \177E, 2)   = 2
4589  lseek(3, 0, SEEK_SET) = 0
4589  fstat(3, {st_mode=S_IFREG|0644, st_size=950325, ...}) = 0
4589  mmap(NULL, 950325, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 
0x7f7963c4c000
4589  init_module(0x7f7963c4c000, 950325, ) = 0

Process 4590 (loses out on libcrc32c and gets stuck)


4590  execve(/sbin/modprobe.bin, [/sbin/modprobe.bin, 
pci:v14E4d1639sv1014...], [/* 14 vars */]) = 0



4590  open(/lib/modules/2.6.32.9-debug/kernel/lib/libcrc32c.ko,O_RDONLY) = 3
4590  stat(/sys/module, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4590  stat(/sys/module/libcrc32c, 0x752a0ef0) = -1 ENOENT (No such file 
or directory)
4590  read(3, \177E, 2)   = 2
4590  lseek(3, 0, SEEK_SET) = 0
4590  fstat(3, {st_mode=S_IFREG|0644, st_size=90482, ...}) = 0
4590  mmap(NULL, 90482, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 
0x7f1397a1b000
4590  init_module(0x7f1397a1b000, 90482, ) = -1 EEXIST (File exists)
4590  close(3)  = 0
4590  open(/lib/modules/2.6.32.9-debug/kernel/drivers/net/bnx2.ko, O_RDONLY) 
= 3
4590  stat(/sys/module, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4590  stat(/sys/module/bnx2, 0x752a0f90) = -1 ENOENT (No such file or 
directory)
4590  read(3, \177E, 2)   = 2
4590  lseek(3, 0, SEEK_SET) = 0
4590  fstat(3, {st_mode=S_IFREG|0644, st_size=950325, ...}) = 0
4590  mmap(NULL, 950325, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 
0x7f139791e000
4590  init_module(0x7f139791e000, 950325, ) = -1 ENOENT (No such file or 
directory)
4590  write(2, FATAL: Error inserting bnx2 (/li..., 145) = 145

As you can see 4589 inserts libcrc32c and the other starts inserting
bnx2 after seeing libcrc32c was loaded. Creating our situation. Phew,
took me a bit to see what was really going on :)

Side note: Intially I thought that modprobe was forking for each
module dependency but that was incorrect after looking at the
code. udev is forking off two modprobes almost at the same time for
each bnx2x device alias when udevadm trigger is called during the boot
sequence.

FWIW, I don't think this is a SUSE thing as our udev is pretty lightly
patched.

Cheers,

Brandon
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread Brandon Philips
On 09:08 Sat 20 Mar 2010, Herbert Xu wrote:
 On Sat, Mar 20, 2010 at 09:01:15AM +0800, Herbert Xu wrote:
 
  IIRC modprobe(8) in user-space is supposed to load all dependencies
  prior to loading the main module.  So something must've gone wrong
  before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
  wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.
 
 In fact I just tried it on my machine and modprobe bnx2x correctly
 loaded libcrc32c + crc32c before loading bnx2x (you don't need the
 hardware, it'll stay loaded).  My kernel is 2.6.33-rc8.
 
 I suspect it's a user-space problem.  I'm using module-init-tools
 3.4-1 from Debian.

This is module-init-tools 3.11.1 and SUSE Linux Enterprise SP1 RC1. I
can only reproduce on boot (thus why I changed bnx2 which I have) so
perhaps SUSE is doing something odd in how it loads modules at boot.

I will trace the boot process and get more information on exactly what
happens before it gets stuck.

Do you see how it is getting stuck though with modprobe bnx2 and
modprobe libcrc32c going at the same time though?

Cheers,

Brandon
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html