Added some debugging to the teardown code and managed to reproduce this.
What we find is that we unbind and then attempt and fail a bind on the
array, then we see the deletes for the unbind complete. This leads to
the bind failure:
[ 3.476504] md: bind<sda1>
[...]
[ 35.097882] md: md0 stopped.
[ 35.097897] md: unbind<sda1>
[ 35.097907] APW: sysfs_remove_link ret<0>
[ 35.110198] md: export_rdev(sda1)
[ 35.113254] md: bind<sda1>
[ 35.113297] ------------[ cut here ]------------
[ 35.113300] WARNING: at
/home/apw/build/jaunty/ubuntu-jaunty/fs/sysfs/dir.c:462
sysfs_add_one+0x4c/0x50()
[...]
[ 35.115126] APW: deleted something
Here where we happened to mount successfully, note the delete falls in
the expected place:
[ 3.479917] md: bind<sda5>
[...]
[ 35.118235] md: md1 stopped.
[ 35.118240] md: unbind<sda5>
[ 35.118244] APW: sysfs_remove_link ret<0>
[ 35.140164] md: export_rdev(sda5)
[ 35.142276] APW: deleted something
[ 35.143848] md: bind<sda1>
[ 35.152288] md: bind<sda5>
[ 35.158571] raid1: raid set md1 active with 1 out of 2 mirrors
If we look at the code for stopping the array we see the following:
static int do_md_stop(mddev_t * mddev, int mode, int is_open)
{
[...]
rdev_for_each(rdev, tmp, mddev)
if (rdev->raid_disk >= 0) {
char nm[20];
sprintf(nm, "rd%d", rdev->raid_disk);
sysfs_remove_link(&mddev->kobj, nm);
}
/* make sure all md_delayed_delete calls have finished */
flush_scheduled_work();
export_array(mddev);
[...]
Note that we flush_scheduled_work() to wait for md_delayed_deletes and then
export the array. However it is export_array() which triggers these
deletes:
static void export_array(mddev_t *mddev)
{
[...]
rdev_for_each(rdev, tmp, mddev) {
if (!rdev->mddev) {
MD_BUG();
continue;
}
kick_rdev_from_array(rdev);
}
[...]
}
It does this via unbind_rdev_from_array():
static void kick_rdev_from_array(mdk_rdev_t * rdev)
{
unbind_rdev_from_array(rdev);
export_rdev(rdev);
}
Which triggers the delated delete:
static void unbind_rdev_from_array(mdk_rdev_t * rdev)
{
[...]
rdev->sysfs_state = NULL;
/* We need to delay this, otherwise we can deadlock when
* writing to 'remove' to "dev/state". We also need
* to delay it due to rcu usage.
*/
synchronize_rcu();
INIT_WORK(&rdev->del_work, md_delayed_delete);
kobject_get(&rdev->kobj);
schedule_work(&rdev->del_work);
}
So in reality we do not want to wait for this before the export_array()
but after. Testing with a patch to do this seems to resolve the issue.
--
Degraded RAID boot fails: kobject_add_internal failed for dev-sda1 with
-EEXIST, don't try to register things with the same name in the same directory
https://bugs.launchpad.net/bugs/334994
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs