[Kernel-packages] [Bug 1446064] Re: ISST-SAN: Filesystem converted into read only after interface failover
[Expired for linux (Ubuntu) because there has been no activity for 60 days.] ** Changed in: linux (Ubuntu) Status: Incomplete = Expired -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1446064 Title: ISST-SAN: Filesystem converted into read only after interface failover Status in linux package in Ubuntu: Expired Bug description: == Comment: #0 == I was running interface failover tests on the Storage Texan2 ( TMS9840). zop03-01 has disks coming from Texan2 via NIPV. Interface failover basically brings down a interface, so that other inetrfaces take over, waits for 10 min and then brings it back up. There are 4 interfaces on texan2. While the third one was brought down, the IO on zop03-01 was stopped. I was running tests to create directories and files on the FS which were created on the multipath disks. The tests suddenly started to fail when the interface failover happened on the 3rd interface. Now i see that when i login to the system everything is read only. root@zop03-01:~# touch abc touch: cannot touch ?abc?: Read-only file system root@zop03-01:~# root@zop03-01:~# lsb_release -sc; uname -m; uname -r vivid ppc64le 3.19.0-9-generic root@zop03-01:~# less /etc/fstab # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # file system mount point type options dump pass /dev/mapper/mpath10-part2 / ext4errors=remount-ro 0 1 /dev/mapper/mpath10-part3 noneswapsw 0 0 kte:/kte/ktenfs soft,rw,nolock,auto,exec 0 0 kte:/data /datanfs soft,rw,nolock,auto,exec 0 0 kte:/distros/distros nfs soft,rw,nolock,auto,exec 0 0 kte:/images /imagesnfs soft,rw,nolock,auto,exec 0 0 root@zop03-01:~# root@zop03-01:~# df -lh Filesystem Size Used Avail Use% Mounted on udev7.5G 0 7.5G 0% /dev tmpfs 1.6G 38M 1.5G 3% /run /dev/sdah2 48G 29G 16G 65% / tmpfs 7.6G 0 7.6G 0% /dev/shm tmpfs 5.0M 128K 4.9M 3% /run/lock tmpfs 7.6G 0 7.6G 0% /sys/fs/cgroup tmpfs 1.6G 0 1.6G 0% /run/user/0 root@zop03-01:~# root@zop03-01:~# fsck.ext4 /dev/sdah2 e2fsck 1.42.12 (29-Aug-2014) /dev/sdah2: recovering journal /dev/sdah2 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (4957824, counted=5069276). Fixy? yes Free inodes count wrong (2952662, counted=2953167). Fixy? yes /dev/sdah2: * FILE SYSTEM WAS MODIFIED * /dev/sdah2: * REBOOT LINUX * /dev/sdah2: 186417/3139584 files (1.2% non-contiguous), 7488804/12558080 blocks root@zop03-01:~# = == Comment: #6 == There is no issue now. System is all fine. But i can recreate it easily. If i startIO tests and parallelly start interface failover tests, it can be recreated. But i just dont want it to get into the same state. and you do an fsck, reboot solves the problem. We need to get into the root cause for why the system is going to bad state. == Comment: #8 == Re-creating the issue with more details: The IO tests were running ( IO tests basically create directories and files on the FS). I started the interface failover tests. It basically fails the interface on the SAN subsystem and failsover to the next available interface.There are such 4 interfaces available.The SAN is texan2. WHen the interface was failed, The IO halted for a moment, by that i mean: Created Directory /FS3-part1/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS4-part1/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS3-part2/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS1-part3/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS1-part1/test1 rc = 0 at 04/17/2015 12:24:06 Created Directory /FS7-part1/test1 rc = 0 at 04/17/2015 12:24:07 Created Directory /FS0-part2/test1 rc = 0 at 04/17/2015 12:24:08 Created Directory /FS0-part1/test1 rc = 0 at 04/17/2015 12:24:08 Created Directory /FS0-part3/test1 rc = 0 at 04/17/2015 12:24:09 === Here it halted On the console, i was seeing messages like: root@zop03-01:~# [35488.779299] sd 4:0:0:5: [sdab] Command (2A) failed: transaction cancelled (200:600) flags: 0 fcp_rsp: 0, resid=0, scsi_status: 0
[Kernel-packages] [Bug 1446064] Re: ISST-SAN: Filesystem converted into read only after interface failover
Since there have been some updates to multipath, can this be retested with the latest linux/multipath-tools etc? ** Changed in: linux (Ubuntu) Status: Confirmed = Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1446064 Title: ISST-SAN: Filesystem converted into read only after interface failover Status in linux package in Ubuntu: Incomplete Bug description: == Comment: #0 == I was running interface failover tests on the Storage Texan2 ( TMS9840). zop03-01 has disks coming from Texan2 via NIPV. Interface failover basically brings down a interface, so that other inetrfaces take over, waits for 10 min and then brings it back up. There are 4 interfaces on texan2. While the third one was brought down, the IO on zop03-01 was stopped. I was running tests to create directories and files on the FS which were created on the multipath disks. The tests suddenly started to fail when the interface failover happened on the 3rd interface. Now i see that when i login to the system everything is read only. root@zop03-01:~# touch abc touch: cannot touch ?abc?: Read-only file system root@zop03-01:~# root@zop03-01:~# lsb_release -sc; uname -m; uname -r vivid ppc64le 3.19.0-9-generic root@zop03-01:~# less /etc/fstab # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # file system mount point type options dump pass /dev/mapper/mpath10-part2 / ext4errors=remount-ro 0 1 /dev/mapper/mpath10-part3 noneswapsw 0 0 kte:/kte/ktenfs soft,rw,nolock,auto,exec 0 0 kte:/data /datanfs soft,rw,nolock,auto,exec 0 0 kte:/distros/distros nfs soft,rw,nolock,auto,exec 0 0 kte:/images /imagesnfs soft,rw,nolock,auto,exec 0 0 root@zop03-01:~# root@zop03-01:~# df -lh Filesystem Size Used Avail Use% Mounted on udev7.5G 0 7.5G 0% /dev tmpfs 1.6G 38M 1.5G 3% /run /dev/sdah2 48G 29G 16G 65% / tmpfs 7.6G 0 7.6G 0% /dev/shm tmpfs 5.0M 128K 4.9M 3% /run/lock tmpfs 7.6G 0 7.6G 0% /sys/fs/cgroup tmpfs 1.6G 0 1.6G 0% /run/user/0 root@zop03-01:~# root@zop03-01:~# fsck.ext4 /dev/sdah2 e2fsck 1.42.12 (29-Aug-2014) /dev/sdah2: recovering journal /dev/sdah2 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (4957824, counted=5069276). Fixy? yes Free inodes count wrong (2952662, counted=2953167). Fixy? yes /dev/sdah2: * FILE SYSTEM WAS MODIFIED * /dev/sdah2: * REBOOT LINUX * /dev/sdah2: 186417/3139584 files (1.2% non-contiguous), 7488804/12558080 blocks root@zop03-01:~# = == Comment: #6 == There is no issue now. System is all fine. But i can recreate it easily. If i startIO tests and parallelly start interface failover tests, it can be recreated. But i just dont want it to get into the same state. and you do an fsck, reboot solves the problem. We need to get into the root cause for why the system is going to bad state. == Comment: #8 == Re-creating the issue with more details: The IO tests were running ( IO tests basically create directories and files on the FS). I started the interface failover tests. It basically fails the interface on the SAN subsystem and failsover to the next available interface.There are such 4 interfaces available.The SAN is texan2. WHen the interface was failed, The IO halted for a moment, by that i mean: Created Directory /FS3-part1/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS4-part1/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS3-part2/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS1-part3/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS1-part1/test1 rc = 0 at 04/17/2015 12:24:06 Created Directory /FS7-part1/test1 rc = 0 at 04/17/2015 12:24:07 Created Directory /FS0-part2/test1 rc = 0 at 04/17/2015 12:24:08 Created Directory /FS0-part1/test1 rc = 0 at 04/17/2015 12:24:08 Created Directory /FS0-part3/test1 rc = 0 at 04/17/2015 12:24:09 === Here it halted On the console, i was seeing messages like: root@zop03-01:~# [35488.779299] sd 4:0:0:5: [sdab] Command (2A) failed: transaction cancelled (200:600) flags: 0 fcp_rsp:
[Kernel-packages] [Bug 1446064] Re: ISST-SAN: Filesystem converted into read only after interface failover
** Changed in: linux (Ubuntu) Status: New = Confirmed ** Changed in: linux (Ubuntu) Importance: Undecided = Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1446064 Title: ISST-SAN: Filesystem converted into read only after interface failover Status in linux package in Ubuntu: Confirmed Bug description: == Comment: #0 == I was running interface failover tests on the Storage Texan2 ( TMS9840). zop03-01 has disks coming from Texan2 via NIPV. Interface failover basically brings down a interface, so that other inetrfaces take over, waits for 10 min and then brings it back up. There are 4 interfaces on texan2. While the third one was brought down, the IO on zop03-01 was stopped. I was running tests to create directories and files on the FS which were created on the multipath disks. The tests suddenly started to fail when the interface failover happened on the 3rd interface. Now i see that when i login to the system everything is read only. root@zop03-01:~# touch abc touch: cannot touch ?abc?: Read-only file system root@zop03-01:~# root@zop03-01:~# lsb_release -sc; uname -m; uname -r vivid ppc64le 3.19.0-9-generic root@zop03-01:~# less /etc/fstab # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # file system mount point type options dump pass /dev/mapper/mpath10-part2 / ext4errors=remount-ro 0 1 /dev/mapper/mpath10-part3 noneswapsw 0 0 kte:/kte/ktenfs soft,rw,nolock,auto,exec 0 0 kte:/data /datanfs soft,rw,nolock,auto,exec 0 0 kte:/distros/distros nfs soft,rw,nolock,auto,exec 0 0 kte:/images /imagesnfs soft,rw,nolock,auto,exec 0 0 root@zop03-01:~# root@zop03-01:~# df -lh Filesystem Size Used Avail Use% Mounted on udev7.5G 0 7.5G 0% /dev tmpfs 1.6G 38M 1.5G 3% /run /dev/sdah2 48G 29G 16G 65% / tmpfs 7.6G 0 7.6G 0% /dev/shm tmpfs 5.0M 128K 4.9M 3% /run/lock tmpfs 7.6G 0 7.6G 0% /sys/fs/cgroup tmpfs 1.6G 0 1.6G 0% /run/user/0 root@zop03-01:~# root@zop03-01:~# fsck.ext4 /dev/sdah2 e2fsck 1.42.12 (29-Aug-2014) /dev/sdah2: recovering journal /dev/sdah2 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (4957824, counted=5069276). Fixy? yes Free inodes count wrong (2952662, counted=2953167). Fixy? yes /dev/sdah2: * FILE SYSTEM WAS MODIFIED * /dev/sdah2: * REBOOT LINUX * /dev/sdah2: 186417/3139584 files (1.2% non-contiguous), 7488804/12558080 blocks root@zop03-01:~# = == Comment: #6 == There is no issue now. System is all fine. But i can recreate it easily. If i startIO tests and parallelly start interface failover tests, it can be recreated. But i just dont want it to get into the same state. and you do an fsck, reboot solves the problem. We need to get into the root cause for why the system is going to bad state. == Comment: #8 == Re-creating the issue with more details: The IO tests were running ( IO tests basically create directories and files on the FS). I started the interface failover tests. It basically fails the interface on the SAN subsystem and failsover to the next available interface.There are such 4 interfaces available.The SAN is texan2. WHen the interface was failed, The IO halted for a moment, by that i mean: Created Directory /FS3-part1/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS4-part1/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS3-part2/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS1-part3/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS1-part1/test1 rc = 0 at 04/17/2015 12:24:06 Created Directory /FS7-part1/test1 rc = 0 at 04/17/2015 12:24:07 Created Directory /FS0-part2/test1 rc = 0 at 04/17/2015 12:24:08 Created Directory /FS0-part1/test1 rc = 0 at 04/17/2015 12:24:08 Created Directory /FS0-part3/test1 rc = 0 at 04/17/2015 12:24:09 === Here it halted On the console, i was seeing messages like: root@zop03-01:~# [35488.779299] sd 4:0:0:5: [sdab] Command (2A) failed: transaction cancelled (200:600) flags: 0 fcp_rsp: 0, resid=0, scsi_status: 0 [35488.779344] sd 4:0:0:5:
[Kernel-packages] [Bug 1446064] Re: ISST-SAN: Filesystem converted into read only after interface failover
Reassigning to linux; this seems most likely to be a filesystem/ibmvfc driver issue. ** Package changed: ubuntu = linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1446064 Title: ISST-SAN: Filesystem converted into read only after interface failover Status in linux package in Ubuntu: New Bug description: == Comment: #0 == I was running interface failover tests on the Storage Texan2 ( TMS9840). zop03-01 has disks coming from Texan2 via NIPV. Interface failover basically brings down a interface, so that other inetrfaces take over, waits for 10 min and then brings it back up. There are 4 interfaces on texan2. While the third one was brought down, the IO on zop03-01 was stopped. I was running tests to create directories and files on the FS which were created on the multipath disks. The tests suddenly started to fail when the interface failover happened on the 3rd interface. Now i see that when i login to the system everything is read only. root@zop03-01:~# touch abc touch: cannot touch ?abc?: Read-only file system root@zop03-01:~# root@zop03-01:~# lsb_release -sc; uname -m; uname -r vivid ppc64le 3.19.0-9-generic root@zop03-01:~# less /etc/fstab # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # file system mount point type options dump pass /dev/mapper/mpath10-part2 / ext4errors=remount-ro 0 1 /dev/mapper/mpath10-part3 noneswapsw 0 0 kte:/kte/ktenfs soft,rw,nolock,auto,exec 0 0 kte:/data /datanfs soft,rw,nolock,auto,exec 0 0 kte:/distros/distros nfs soft,rw,nolock,auto,exec 0 0 kte:/images /imagesnfs soft,rw,nolock,auto,exec 0 0 root@zop03-01:~# root@zop03-01:~# df -lh Filesystem Size Used Avail Use% Mounted on udev7.5G 0 7.5G 0% /dev tmpfs 1.6G 38M 1.5G 3% /run /dev/sdah2 48G 29G 16G 65% / tmpfs 7.6G 0 7.6G 0% /dev/shm tmpfs 5.0M 128K 4.9M 3% /run/lock tmpfs 7.6G 0 7.6G 0% /sys/fs/cgroup tmpfs 1.6G 0 1.6G 0% /run/user/0 root@zop03-01:~# root@zop03-01:~# fsck.ext4 /dev/sdah2 e2fsck 1.42.12 (29-Aug-2014) /dev/sdah2: recovering journal /dev/sdah2 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (4957824, counted=5069276). Fixy? yes Free inodes count wrong (2952662, counted=2953167). Fixy? yes /dev/sdah2: * FILE SYSTEM WAS MODIFIED * /dev/sdah2: * REBOOT LINUX * /dev/sdah2: 186417/3139584 files (1.2% non-contiguous), 7488804/12558080 blocks root@zop03-01:~# = == Comment: #6 == There is no issue now. System is all fine. But i can recreate it easily. If i startIO tests and parallelly start interface failover tests, it can be recreated. But i just dont want it to get into the same state. and you do an fsck, reboot solves the problem. We need to get into the root cause for why the system is going to bad state. == Comment: #8 == Re-creating the issue with more details: The IO tests were running ( IO tests basically create directories and files on the FS). I started the interface failover tests. It basically fails the interface on the SAN subsystem and failsover to the next available interface.There are such 4 interfaces available.The SAN is texan2. WHen the interface was failed, The IO halted for a moment, by that i mean: Created Directory /FS3-part1/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS4-part1/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS3-part2/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS1-part3/test1 rc = 0 at 04/17/2015 12:24:05 Created Directory /FS1-part1/test1 rc = 0 at 04/17/2015 12:24:06 Created Directory /FS7-part1/test1 rc = 0 at 04/17/2015 12:24:07 Created Directory /FS0-part2/test1 rc = 0 at 04/17/2015 12:24:08 Created Directory /FS0-part1/test1 rc = 0 at 04/17/2015 12:24:08 Created Directory /FS0-part3/test1 rc = 0 at 04/17/2015 12:24:09 === Here it halted On the console, i was seeing messages like: root@zop03-01:~# [35488.779299] sd 4:0:0:5: [sdab] Command (2A) failed: transaction cancelled (200:600) flags: 0 fcp_rsp: 0, resid=0, scsi_status: 0 [35488.779344] sd 4:0:0:5: