========
ANALYSIS
========

#0 [ffff880036a73b38] schedule at ffffffff8175e320
#1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]
#2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
#3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]
#4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs]

Analyzing the stack trace we can see that, during a xfs_fs_mount, the only
possible way to get into "xfs_log_unmount" is if the XFS filesystem is
corrupted:

636 xfs_mountfs(
637 xfs_mount_t *mp)
638 {
639 xfs_sb_t *sbp = &(mp->m_sb);
640 xfs_inode_t *rip;
...
836 /*
837 * Get and sanity-check the root inode.
838 * Save the pointer to it in the mount structure.
839 */
840 error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
841 if (error) {
842 xfs_warn(mp, "failed to read root inode");
843 goto out_log_dealloc;
844 }
845
...
955 out_log_dealloc:
956 xfs_log_unmount(mp);
...

So we DO KNOW your XFS is considered to be CORRUPTED (by XFS function
xfs_iget(), called for the root inode as a sanity check).

Either way, lets continue debugging to make sure we understand why XFS
didn't give us an error about the filesystem being corrupted:

Following the stack:

#2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
#3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]

We can see that xfs_log_quiesce calls

#1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]

The function responsible to push ALL *AIL structure into disk (for unmount
and freeze purposes).

This function has a simple code:

600 struct xfs_log_item *lip;
601 DEFINE_WAIT(wait);
602
603 spin_lock(&ailp->xa_lock);
604 while ((lip = xfs_ail_max(ailp)) != NULL) {
605 prepare_to_wait(&ailp->xa_empty, &wait, TASK_UNINTERRUPTIBLE);
606 ailp->xa_target = lip->li_lsn;
607 wake_up_process(ailp->xa_task);
608 spin_unlock(&ailp->xa_lock);
609 schedule();
610 spin_lock(&ailp->xa_lock);
611 }
612 spin_unlock(&ailp->xa_lock);
613
614 finish_wait(&ailp->xa_empty, &wait);

Where it gets all "xfs_log_items" from the AIL double linked list and
calls the function responsible to commit this "log items" into the disk:

607 wake_up_process(ailp->xa_task);

So there are 2 possible things happening:

1) XFS is stuck inside this loop because of something happening on the
ail (xa_task) callback function (responsible to commit xfs log items).
And this, of course, makes the "mount" process to hang in the "UNINTE-
RRUPTIBLE state (since its not safe to let userland kill this process).

OBS: We cannot continue analyzing because we lack "core" file (that would
give us the stackstrace for the kernel thread responsible for the callback)
despite our efforts to get it during the crisis.

2) XFS is stuck inside this loop because there are many log items
to be committed (like it could happen in a stress test scenario) that
were not commit yet.

OBS: We cannot continue because we lack the sosreport, that could be
pointing out to us that the task is being held for more then XX seconds.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1382801

Title:
  XFS: mount hangs for corrupted filesystem

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382801/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to