Re: [lustre-discuss] lustre 2.10.5 or 2.11.0
On 10/19/18 12:37 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote: On Oct 17, 2018, at 7:30 PM, Riccardo Veraldi wrote: anyway especially regarding the OSSes you may eventually need some ZFS module parameters optimizations regarding vdev_write and vdev_read max to increase those values higher than default. You may also disable ZIL, change the redundant_metadata to "most" atime off. I could send you a list of parameters that in my case work well. Riccardo, Would you mind sharing your ZFS parameters with the mailing list? I would be interested to see which options you have changed. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu this worked for me on my high performance cluster options zfs zfs_prefetch_disable=1 options zfs zfs_txg_history=120 options zfs metaslab_debug_unload=1 # options zfs zfs_vdev_scheduler=deadline options zfs zfs_vdev_async_write_active_min_dirty_percent=20 # options zfs zfs_vdev_scrub_min_active=48 options zfs zfs_vdev_scrub_max_active=128 # options zfs zfs_vdev_sync_write_min_active=8 options zfs zfs_vdev_sync_write_max_active=32 options zfs zfs_vdev_sync_read_min_active=8 options zfs zfs_vdev_sync_read_max_active=32 options zfs zfs_vdev_async_read_min_active=8 options zfs zfs_vdev_async_read_max_active=32 options zfs zfs_top_maxinflight=320 options zfs zfs_txg_timeout=30 options zfs zfs_dirty_data_max_percent=40 options zfs zfs_vdev_async_write_min_active=8 options zfs zfs_vdev_async_write_max_active=32 ## these the zfs attributes that I changed on the OSSes: zfs set mountpoint=none $ostpool zfs set sync=disabled $ostpool zfs set atime=off $ostpool zfs set redundant_metadata=most $ostpool zfs set xattr=sa $ostpool zfs set recordsize=1M $ostpool # these the ko2iblnd parameters for FDR Mellanox IB interfaces options ko2iblnd timeout=100 peer_credits=63 credits=2560 concurrent_sends=63 ntx=2048 fmr_pool_size=1280 fmr_flush_trigger=1024 ntx=5120 these the ksocklnd paramaters options ksocklnd sock_timeout=100 credits=2560 peer_credits=63 ## these other parameters that I did tweak echo 32 > /sys/module/ptlrpc/parameters/max_ptlrpcds echo 3 > /sys/module/ptlrpc/parameters/ptlrpcd_bind_policy lctl set_param timeout=600 lctl set_param ldlm_timeout=200 lctl set_param at_min=250 lctl set_param at_max=600 ### Also I run this script at boot time to redefine IRQ assignments for hard drives spanned across all CPUs, not needed for kernel > 4.4 #!/bin/sh # numa_smp.sh device=$1 cpu1=$2 cpu2=$3 cpu=$cpu1 grep $1 /proc/interrupts|awk '{print $1}'|sed 's/://'|while read int do echo $cpu > /proc/irq/$int/smp_affinity_list echo "echo CPU $cpu > /proc/irq/$a/smp_affinity_list" if [ $cpu = $cpu2 ] then cpu=$cpu1 else ((cpu=$cpu+1)) fi done ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre 2.10.5 or 2.11.0
> On Oct 17, 2018, at 7:30 PM, Riccardo Veraldi > wrote: > > anyway especially regarding the OSSes you may eventually need some ZFS module > parameters optimizations regarding vdev_write and vdev_read max to increase > those values higher than default. You may also disable ZIL, change the > redundant_metadata to "most" atime off. > > I could send you a list of parameters that in my case work well. Riccardo, Would you mind sharing your ZFS parameters with the mailing list? I would be interested to see which options you have changed. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5
There is a somewhat hidden danger with eviction: You can get silent data loss. The simplest example is buffered (ie, any that aren't direct I/O) writes - Lustre reports completion (ie your write() syscall completes) once the data is in the page cache on the client (like any modern file system, including local ones - you can get silent data loss on EXT4, XFS, ZFS, etc, if your disk becomes unavailable before data is written out of the page cache). So if that client with pending dirty data is evicted from the OST the data is destined for - which is essentially what abort recovery does - that data is lost, and the application doesn't get an error (because the write() call has already completed). A message is printed to the console on the client in this case, but you have to know to look for it. The application will run to completion, but you may still experience data loss, and not know it. It's just something to keep in mind - applications that run to completion despite evictions may not have completed *successfully*. - Patrick On 10/19/18, 11:42 AM, "lustre-discuss on behalf of Mohr Jr, Richard Frank (Rick Mohr)" wrote: > On Oct 19, 2018, at 10:42 AM, Marion Hakanson wrote: > > Thanks for the feedback. You're both confirming what we've learned so far, that we had to unmount all the clients (which required rebooting most of them), then reboot all the storage servers, to get things unstuck until the problem recurred. > > I tried abort_recovery on the clients last night, before rebooting the MDS, but that did not help. Could well be I'm not using it right: Aborting recovery is a server-side action, not something that is done on the client. As mentioned by Peter, you can abort recovery on a single target after it is mounted by using “lctl —device abort_recover”. But you can also just skip over the recovery step when the target is mounted by adding the “-o abort_recov” option to the mount command. For example, mount -t lustre -o abort_recov /dev/my/mdt /mnt/lustre/mdt0 And similarly for OSTs. So you should be able to just unmount your MDT/OST on the running file system and then remount with the abort_recov option. From a client perspective, the lustre client will get evicted but should automatically reconnect. Some applications can ride through a client eviction without causing issues, some cannot. I think it depends largely on how the application does its IO and if there is any IO in flight when the eviction occurs. I have had to do this a few times on a running cluster, and in my experience, we have had good luck with most of the applications continuing without issues. Sometimes there are a few jobs that abort, but overall this is better than having to stop all jobs and remount lustre on all the compute nodes. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5
> On Oct 19, 2018, at 10:42 AM, Marion Hakanson wrote: > > Thanks for the feedback. You're both confirming what we've learned so far, > that we had to unmount all the clients (which required rebooting most of > them), then reboot all the storage servers, to get things unstuck until the > problem recurred. > > I tried abort_recovery on the clients last night, before rebooting the MDS, > but that did not help. Could well be I'm not using it right: Aborting recovery is a server-side action, not something that is done on the client. As mentioned by Peter, you can abort recovery on a single target after it is mounted by using “lctl —device abort_recover”. But you can also just skip over the recovery step when the target is mounted by adding the “-o abort_recov” option to the mount command. For example, mount -t lustre -o abort_recov /dev/my/mdt /mnt/lustre/mdt0 And similarly for OSTs. So you should be able to just unmount your MDT/OST on the running file system and then remount with the abort_recov option. >From a client perspective, the lustre client will get evicted but should automatically reconnect. Some applications can ride through a client eviction without causing issues, some cannot. I think it depends largely on how the application does its IO and if there is any IO in flight when the eviction occurs. I have had to do this a few times on a running cluster, and in my experience, we have had good luck with most of the applications continuing without issues. Sometimes there are a few jobs that abort, but overall this is better than having to stop all jobs and remount lustre on all the compute nodes. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5
On Oct 19, 2018, at 08:42, Marion Hakanson wrote: > > Thanks for the feedback. You're both confirming what we've learned so far, > that we had to unmount all the clients (which required rebooting most of > them), then reboot all the storage servers, to get things unstuck until the > problem recurred. > > I tried abort_recovery on the clients last night, before rebooting the MDS, > but that did not help. Could well be I'm not using it right: > > - look up the MDT in "lctl dl" list. > - run "lctl abort_recovery $mdt" on all clients > - reboot the MDS. > > The MDS still reported recovering all 259 clients at boot time. The point of abort_recovery is to reset the server recovery engine without doing client recovery (i.e. tell it "don't try to recover these clients after the server restarted"). It should be run on the MDS and not the clients. Also, if you reboot the MDS then it will start recovery again, so don't do that... > BTW, we have a separate MGS from the MDS. Could it be we need to reboot both > MDS & MGS to clear things? > > Thanks and regards, > > Marion > > >> On Oct 19, 2018, at 07:28, Peter Bortas wrote: >> >> That should fix it, but I'd like to advocate for using abort_recovery. >> Compared to unmounting thousands of clients abort_recovery is a quick >> operation that takes a few minutes to do. Wouldn't say it gets used a >> lot, but I've done it on NSCs live environment six times since 2016, >> solving the deadlocks each time. >> >> Regards, >> -- >> Peter Bortas >> Swedish National Supercomputer Centre >> >>> On Fri, Oct 19, 2018 at 3:04 PM Patrick Farrell wrote: >>> >>> >>> Marion, >>> >>> You note the deadlock reoccurs on server reboot, so you’re really stuck. >>> This is most likely due to recovery where operations from the clients are >>> replayed. >>> >>> If you’re fine with letting any pending I/O fail in order to get the system >>> back up, I would suggest a client side action: unmount (-f, and be patient) >>> and /or shut down all of your clients. That will discard things the >>> clients are trying to replay, (causing pending I/O to fail). Then shut >>> down your servers and start them up again. With no clients, there’s >>> (almost) nothing to replay, and you probably won’t hit the issue on >>> startup. (There’s also the abort_recovery option covered in the manual, >>> but I personally think this is easier.) >>> >>> There’s no guarantee this avoids your deadlock happening again, but it’s >>> highly likely it’ll at least get you running. >>> >>> If you need to save your pending I/O, you’ll have to install patched >>> software with a fix for this (sounds like WC has identified the bug) and >>> then reboot. >>> >>> Good luck! >>> - Patrick >>> >>> From: lustre-discuss on behalf of >>> Marion Hakanson >>> Sent: Friday, October 19, 2018 1:32:10 AM >>> To: lustre-discuss@lists.lustre.org >>> Subject: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5 >>> >>> This issue is really kicking our behinds: >>> https://jira.whamcloud.com/browse/LU-11465 >>> >>> While we're waiting for the issue to get some attention from Lustre >>> developers, are there suggestions on how we can recover our cluster from >>> this kind of deadlocked, stuck-threads-on-the-MDS (or OSS) situation? >>> Rebooting the storage servers does not clear the hang-up, as upon reboot >>> the MDS quickly ends up with the same number of D-state threads (around the >>> same number as we have clients). It seems to me like there is some state >>> stashed away in the filesystem which restores the deadlock as soon as the >>> MDS comes up. >>> >>> Thanks and regards, >>> >>> Marion >>> >>> ___ >>> lustre-discuss mailing list >>> lustre-discuss@lists.lustre.org >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org Cheers, Andreas --- Andreas Dilger Principal Lustre Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5
Sigh. Instructions that I've found for that have been a bit on the slim side (:-). We'll give it a try. Thanks and regards, Marion > On Oct 19, 2018, at 07:59, Peter Bortas wrote: > > So, that is at least not a syntax for abort_recovery I'm familiar > with. To take an example from last time I did this, I first determined > which device wasn't completing the recovery, then logged in on the > server (an OST in this case) and ran: > > # lctl dl|grep obdfilter|grep fouo5-OST > 3 UP obdfilter fouo5-OST fouo5-OST_UUID 629 > # lctl --device 3 abort_recovery > > Attached is a script that you can invoke with "lustre_watch_recovery > " that will give you the status of recovery on the named > server updated once per second. I find it useful for keeping track of > how things are working out while doing restarts. > > Regards, > -- > Peter Bortas, NSC > > > > > > > > >> On Fri, Oct 19, 2018 at 4:42 PM Marion Hakanson wrote: >> >> Thanks for the feedback. You're both confirming what we've learned so far, >> that we had to unmount all the clients (which required rebooting most of >> them), then reboot all the storage servers, to get things unstuck until the >> problem recurred. >> >> I tried abort_recovery on the clients last night, before rebooting the MDS, >> but that did not help. Could well be I'm not using it right: >> >> - look up the MDT in "lctl dl" list. >> - run "lctl abort_recovery $mdt" on all clients >> - reboot the MDS. >> >> The MDS still reported recovering all 259 clients at boot time. >> >> BTW, we have a separate MGS from the MDS. Could it be we need to reboot >> both MDS & MGS to clear things? >> >> Thanks and regards, >> >> Marion >> >> >>> On Oct 19, 2018, at 07:28, Peter Bortas wrote: >>> >>> That should fix it, but I'd like to advocate for using abort_recovery. >>> Compared to unmounting thousands of clients abort_recovery is a quick >>> operation that takes a few minutes to do. Wouldn't say it gets used a >>> lot, but I've done it on NSCs live environment six times since 2016, >>> solving the deadlocks each time. >>> >>> Regards, >>> -- >>> Peter Bortas >>> Swedish National Supercomputer Centre >>> On Fri, Oct 19, 2018 at 3:04 PM Patrick Farrell wrote: Marion, You note the deadlock reoccurs on server reboot, so you’re really stuck. This is most likely due to recovery where operations from the clients are replayed. If you’re fine with letting any pending I/O fail in order to get the system back up, I would suggest a client side action: unmount (-f, and be patient) and /or shut down all of your clients. That will discard things the clients are trying to replay, (causing pending I/O to fail). Then shut down your servers and start them up again. With no clients, there’s (almost) nothing to replay, and you probably won’t hit the issue on startup. (There’s also the abort_recovery option covered in the manual, but I personally think this is easier.) There’s no guarantee this avoids your deadlock happening again, but it’s highly likely it’ll at least get you running. If you need to save your pending I/O, you’ll have to install patched software with a fix for this (sounds like WC has identified the bug) and then reboot. Good luck! - Patrick From: lustre-discuss on behalf of Marion Hakanson Sent: Friday, October 19, 2018 1:32:10 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5 This issue is really kicking our behinds: https://jira.whamcloud.com/browse/LU-11465 While we're waiting for the issue to get some attention from Lustre developers, are there suggestions on how we can recover our cluster from this kind of deadlocked, stuck-threads-on-the-MDS (or OSS) situation? Rebooting the storage servers does not clear the hang-up, as upon reboot the MDS quickly ends up with the same number of D-state threads (around the same number as we have clients). It seems to me like there is some state stashed away in the filesystem which restores the deadlock as soon as the MDS comes up. Thanks and regards, Marion ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5
So, that is at least not a syntax for abort_recovery I'm familiar with. To take an example from last time I did this, I first determined which device wasn't completing the recovery, then logged in on the server (an OST in this case) and ran: # lctl dl|grep obdfilter|grep fouo5-OST 3 UP obdfilter fouo5-OST fouo5-OST_UUID 629 # lctl --device 3 abort_recovery Attached is a script that you can invoke with "lustre_watch_recovery " that will give you the status of recovery on the named server updated once per second. I find it useful for keeping track of how things are working out while doing restarts. Regards, -- Peter Bortas, NSC On Fri, Oct 19, 2018 at 4:42 PM Marion Hakanson wrote: > > Thanks for the feedback. You're both confirming what we've learned so far, > that we had to unmount all the clients (which required rebooting most of > them), then reboot all the storage servers, to get things unstuck until the > problem recurred. > > I tried abort_recovery on the clients last night, before rebooting the MDS, > but that did not help. Could well be I'm not using it right: > > - look up the MDT in "lctl dl" list. > - run "lctl abort_recovery $mdt" on all clients > - reboot the MDS. > > The MDS still reported recovering all 259 clients at boot time. > > BTW, we have a separate MGS from the MDS. Could it be we need to reboot both > MDS & MGS to clear things? > > Thanks and regards, > > Marion > > > > On Oct 19, 2018, at 07:28, Peter Bortas wrote: > > > > That should fix it, but I'd like to advocate for using abort_recovery. > > Compared to unmounting thousands of clients abort_recovery is a quick > > operation that takes a few minutes to do. Wouldn't say it gets used a > > lot, but I've done it on NSCs live environment six times since 2016, > > solving the deadlocks each time. > > > > Regards, > > -- > > Peter Bortas > > Swedish National Supercomputer Centre > > > >> On Fri, Oct 19, 2018 at 3:04 PM Patrick Farrell wrote: > >> > >> > >> Marion, > >> > >> You note the deadlock reoccurs on server reboot, so you’re really stuck. > >> This is most likely due to recovery where operations from the clients are > >> replayed. > >> > >> If you’re fine with letting any pending I/O fail in order to get the > >> system back up, I would suggest a client side action: unmount (-f, and be > >> patient) and /or shut down all of your clients. That will discard things > >> the clients are trying to replay, (causing pending I/O to fail). Then > >> shut down your servers and start them up again. With no clients, there’s > >> (almost) nothing to replay, and you probably won’t hit the issue on > >> startup. (There’s also the abort_recovery option covered in the manual, > >> but I personally think this is easier.) > >> > >> There’s no guarantee this avoids your deadlock happening again, but it’s > >> highly likely it’ll at least get you running. > >> > >> If you need to save your pending I/O, you’ll have to install patched > >> software with a fix for this (sounds like WC has identified the bug) and > >> then reboot. > >> > >> Good luck! > >> - Patrick > >> > >> From: lustre-discuss on behalf > >> of Marion Hakanson > >> Sent: Friday, October 19, 2018 1:32:10 AM > >> To: lustre-discuss@lists.lustre.org > >> Subject: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5 > >> > >> This issue is really kicking our behinds: > >> https://jira.whamcloud.com/browse/LU-11465 > >> > >> While we're waiting for the issue to get some attention from Lustre > >> developers, are there suggestions on how we can recover our cluster from > >> this kind of deadlocked, stuck-threads-on-the-MDS (or OSS) situation? > >> Rebooting the storage servers does not clear the hang-up, as upon reboot > >> the MDS quickly ends up with the same number of D-state threads (around > >> the same number as we have clients). It seems to me like there is some > >> state stashed away in the filesystem which restores the deadlock as soon > >> as the MDS comes up. > >> > >> Thanks and regards, > >> > >> Marion > >> > >> ___ > >> lustre-discuss mailing list > >> lustre-discuss@lists.lustre.org > >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org lustre_watch_recovery Description: Binary data ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5
Thanks for the feedback. You're both confirming what we've learned so far, that we had to unmount all the clients (which required rebooting most of them), then reboot all the storage servers, to get things unstuck until the problem recurred. I tried abort_recovery on the clients last night, before rebooting the MDS, but that did not help. Could well be I'm not using it right: - look up the MDT in "lctl dl" list. - run "lctl abort_recovery $mdt" on all clients - reboot the MDS. The MDS still reported recovering all 259 clients at boot time. BTW, we have a separate MGS from the MDS. Could it be we need to reboot both MDS & MGS to clear things? Thanks and regards, Marion > On Oct 19, 2018, at 07:28, Peter Bortas wrote: > > That should fix it, but I'd like to advocate for using abort_recovery. > Compared to unmounting thousands of clients abort_recovery is a quick > operation that takes a few minutes to do. Wouldn't say it gets used a > lot, but I've done it on NSCs live environment six times since 2016, > solving the deadlocks each time. > > Regards, > -- > Peter Bortas > Swedish National Supercomputer Centre > >> On Fri, Oct 19, 2018 at 3:04 PM Patrick Farrell wrote: >> >> >> Marion, >> >> You note the deadlock reoccurs on server reboot, so you’re really stuck. >> This is most likely due to recovery where operations from the clients are >> replayed. >> >> If you’re fine with letting any pending I/O fail in order to get the system >> back up, I would suggest a client side action: unmount (-f, and be patient) >> and /or shut down all of your clients. That will discard things the clients >> are trying to replay, (causing pending I/O to fail). Then shut down your >> servers and start them up again. With no clients, there’s (almost) nothing >> to replay, and you probably won’t hit the issue on startup. (There’s also >> the abort_recovery option covered in the manual, but I personally think this >> is easier.) >> >> There’s no guarantee this avoids your deadlock happening again, but it’s >> highly likely it’ll at least get you running. >> >> If you need to save your pending I/O, you’ll have to install patched >> software with a fix for this (sounds like WC has identified the bug) and >> then reboot. >> >> Good luck! >> - Patrick >> >> From: lustre-discuss on behalf of >> Marion Hakanson >> Sent: Friday, October 19, 2018 1:32:10 AM >> To: lustre-discuss@lists.lustre.org >> Subject: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5 >> >> This issue is really kicking our behinds: >> https://jira.whamcloud.com/browse/LU-11465 >> >> While we're waiting for the issue to get some attention from Lustre >> developers, are there suggestions on how we can recover our cluster from >> this kind of deadlocked, stuck-threads-on-the-MDS (or OSS) situation? >> Rebooting the storage servers does not clear the hang-up, as upon reboot the >> MDS quickly ends up with the same number of D-state threads (around the same >> number as we have clients). It seems to me like there is some state stashed >> away in the filesystem which restores the deadlock as soon as the MDS comes >> up. >> >> Thanks and regards, >> >> Marion >> >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5
That should fix it, but I'd like to advocate for using abort_recovery. Compared to unmounting thousands of clients abort_recovery is a quick operation that takes a few minutes to do. Wouldn't say it gets used a lot, but I've done it on NSCs live environment six times since 2016, solving the deadlocks each time. Regards, -- Peter Bortas Swedish National Supercomputer Centre On Fri, Oct 19, 2018 at 3:04 PM Patrick Farrell wrote: > > > Marion, > > You note the deadlock reoccurs on server reboot, so you’re really stuck. > This is most likely due to recovery where operations from the clients are > replayed. > > If you’re fine with letting any pending I/O fail in order to get the system > back up, I would suggest a client side action: unmount (-f, and be patient) > and /or shut down all of your clients. That will discard things the clients > are trying to replay, (causing pending I/O to fail). Then shut down your > servers and start them up again. With no clients, there’s (almost) nothing > to replay, and you probably won’t hit the issue on startup. (There’s also > the abort_recovery option covered in the manual, but I personally think this > is easier.) > > There’s no guarantee this avoids your deadlock happening again, but it’s > highly likely it’ll at least get you running. > > If you need to save your pending I/O, you’ll have to install patched software > with a fix for this (sounds like WC has identified the bug) and then reboot. > > Good luck! > - Patrick > > From: lustre-discuss on behalf of > Marion Hakanson > Sent: Friday, October 19, 2018 1:32:10 AM > To: lustre-discuss@lists.lustre.org > Subject: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5 > > This issue is really kicking our behinds: > https://jira.whamcloud.com/browse/LU-11465 > > While we're waiting for the issue to get some attention from Lustre > developers, are there suggestions on how we can recover our cluster from this > kind of deadlocked, stuck-threads-on-the-MDS (or OSS) situation? Rebooting > the storage servers does not clear the hang-up, as upon reboot the MDS > quickly ends up with the same number of D-state threads (around the same > number as we have clients). It seems to me like there is some state stashed > away in the filesystem which restores the deadlock as soon as the MDS comes > up. > > Thanks and regards, > > Marion > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5
Marion, You note the deadlock reoccurs on server reboot, so you’re really stuck. This is most likely due to recovery where operations from the clients are replayed. If you’re fine with letting any pending I/O fail in order to get the system back up, I would suggest a client side action: unmount (-f, and be patient) and /or shut down all of your clients. That will discard things the clients are trying to replay, (causing pending I/O to fail). Then shut down your servers and start them up again. With no clients, there’s (almost) nothing to replay, and you probably won’t hit the issue on startup. (There’s also the abort_recovery option covered in the manual, but I personally think this is easier.) There’s no guarantee this avoids your deadlock happening again, but it’s highly likely it’ll at least get you running. If you need to save your pending I/O, you’ll have to install patched software with a fix for this (sounds like WC has identified the bug) and then reboot. Good luck! - Patrick From: lustre-discuss on behalf of Marion Hakanson Sent: Friday, October 19, 2018 1:32:10 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5 This issue is really kicking our behinds: https://jira.whamcloud.com/browse/LU-11465 While we're waiting for the issue to get some attention from Lustre developers, are there suggestions on how we can recover our cluster from this kind of deadlocked, stuck-threads-on-the-MDS (or OSS) situation? Rebooting the storage servers does not clear the hang-up, as upon reboot the MDS quickly ends up with the same number of D-state threads (around the same number as we have clients). It seems to me like there is some state stashed away in the filesystem which restores the deadlock as soon as the MDS comes up. Thanks and regards, Marion ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org