[zfs-discuss] time-sliderd doesn't remove snapshots
In the last few days my performance has gone to hell. I'm running: # uname -a SunOS nissan 5.11 snv_150 i86pc i386 i86pc (I'll upgrade as soon as the desktop hang bug is fixed.) The performance problems seem to be due to excessive I/O on the main disk/pool. The only things I've changed recently is that I've created and destroyed a snapshot, and I used zpool upgrade. Here's what I'm seeing: # zpool iostat rpool 5 capacity operationsbandwidth poolalloc free read write read write -- - - - - - - rpool 13.3G 807M 7 85 15.9K 548K rpool 13.3G 807M 3 89 1.60K 723K rpool 13.3G 810M 5 91 5.19K 741K rpool 13.3G 810M 3 94 2.59K 756K Using iofileb.d from the dtrace toolkit shows: # iofileb.d Tracing... Hit Ctrl-C to end. ^C PID CMD KB FILE 0 sched 6 none 5 zpool-rpool7770 none zpool status doesn't show any problems: # zpool status rpool pool: rpool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3d0s0ONLINE 0 0 0 Perhaps related to this or perhaps not, I discovered recently that time-sliderd was doing just a ton of close requests. I disabled time-sliderd while trying to solve my performance problem. I was also getting these error messages in the time-sliderd log file: Warning: Cleanup failed to destroy: rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01 Details: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'destroy', '-d', 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01'] failed with exit code 1 cannot destroy 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01': unsupported version That was the reason I did the zpool upgrade. I discovered that I had a *ton* of snapshots from time-slider that hadn't been destroyed, over 6500 of them, presumably all because of this version problem? I manually removed all the snapshots and my performance returned to normal. I don't quite understand what the -d option to zfs destroy does. Why does time-sliderd use it, and why does it prevent these snapshots from being destroyed? Shouldn't time-sliderd detect that it can't destroy any of the snapshots it's created and stop creating snapshots? And since I don't quite understand why time-sliderd was failing to begin with, I'm nervous about re-enabling it. Do I need to do a zpool upgrade on all my pools to make it work? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] time-sliderd doesn't remove snapshots
Hi Bill, I think the root cause of this problem is that time slider implemented the zfs destroy -d feature but this feature is only available in later pool versions. This means that the routine removal of time slider generated snapshots fails on older pool versions. The zfs destroy -d feature (snapshot user holds) was introduced in pool version 18. I think this bug describes some or all of the problem: https://defect.opensolaris.org/bz/show_bug.cgi?id=16361 Thanks, Cindy On 02/18/11 12:34, Bill Shannon wrote: In the last few days my performance has gone to hell. I'm running: # uname -a SunOS nissan 5.11 snv_150 i86pc i386 i86pc (I'll upgrade as soon as the desktop hang bug is fixed.) The performance problems seem to be due to excessive I/O on the main disk/pool. The only things I've changed recently is that I've created and destroyed a snapshot, and I used zpool upgrade. Here's what I'm seeing: # zpool iostat rpool 5 capacity operationsbandwidth poolalloc free read write read write -- - - - - - - rpool 13.3G 807M 7 85 15.9K 548K rpool 13.3G 807M 3 89 1.60K 723K rpool 13.3G 810M 5 91 5.19K 741K rpool 13.3G 810M 3 94 2.59K 756K Using iofileb.d from the dtrace toolkit shows: # iofileb.d Tracing... Hit Ctrl-C to end. ^C PID CMD KB FILE 0 sched 6 none 5 zpool-rpool7770 none zpool status doesn't show any problems: # zpool status rpool pool: rpool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3d0s0ONLINE 0 0 0 Perhaps related to this or perhaps not, I discovered recently that time-sliderd was doing just a ton of close requests. I disabled time-sliderd while trying to solve my performance problem. I was also getting these error messages in the time-sliderd log file: Warning: Cleanup failed to destroy: rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01 Details: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'destroy', '-d', 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01'] failed with exit code 1 cannot destroy 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01': unsupported version That was the reason I did the zpool upgrade. I discovered that I had a *ton* of snapshots from time-slider that hadn't been destroyed, over 6500 of them, presumably all because of this version problem? I manually removed all the snapshots and my performance returned to normal. I don't quite understand what the -d option to zfs destroy does. Why does time-sliderd use it, and why does it prevent these snapshots from being destroyed? Shouldn't time-sliderd detect that it can't destroy any of the snapshots it's created and stop creating snapshots? And since I don't quite understand why time-sliderd was failing to begin with, I'm nervous about re-enabling it. Do I need to do a zpool upgrade on all my pools to make it work? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] time-sliderd doesn't remove snapshots
One of my old pools was version 10, another was version 13. I guess that explains the problem. Seems like time-sliderd should refuse to run on pools that aren't of a sufficient version. Cindy Swearingen wrote on 02/18/11 12:07 PM: Hi Bill, I think the root cause of this problem is that time slider implemented the zfs destroy -d feature but this feature is only available in later pool versions. This means that the routine removal of time slider generated snapshots fails on older pool versions. The zfs destroy -d feature (snapshot user holds) was introduced in pool version 18. I think this bug describes some or all of the problem: https://defect.opensolaris.org/bz/show_bug.cgi?id=16361 Thanks, Cindy On 02/18/11 12:34, Bill Shannon wrote: In the last few days my performance has gone to hell. I'm running: # uname -a SunOS nissan 5.11 snv_150 i86pc i386 i86pc (I'll upgrade as soon as the desktop hang bug is fixed.) The performance problems seem to be due to excessive I/O on the main disk/pool. The only things I've changed recently is that I've created and destroyed a snapshot, and I used zpool upgrade. Here's what I'm seeing: # zpool iostat rpool 5 capacity operationsbandwidth poolalloc free read write read write -- - - - - - - rpool 13.3G 807M 7 85 15.9K 548K rpool 13.3G 807M 3 89 1.60K 723K rpool 13.3G 810M 5 91 5.19K 741K rpool 13.3G 810M 3 94 2.59K 756K Using iofileb.d from the dtrace toolkit shows: # iofileb.d Tracing... Hit Ctrl-C to end. ^C PID CMD KB FILE 0 sched 6none 5 zpool-rpool7770none zpool status doesn't show any problems: # zpool status rpool pool: rpool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3d0s0ONLINE 0 0 0 Perhaps related to this or perhaps not, I discovered recently that time-sliderd was doing just a ton of close requests. I disabled time-sliderd while trying to solve my performance problem. I was also getting these error messages in the time-sliderd log file: Warning: Cleanup failed to destroy: rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01 Details: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'destroy', '-d', 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01'] failed with exit code 1 cannot destroy 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01': unsupported version That was the reason I did the zpool upgrade. I discovered that I had a *ton* of snapshots from time-slider that hadn't been destroyed, over 6500 of them, presumably all because of this version problem? I manually removed all the snapshots and my performance returned to normal. I don't quite understand what the -d option to zfs destroy does. Why does time-sliderd use it, and why does it prevent these snapshots from being destroyed? Shouldn't time-sliderd detect that it can't destroy any of the snapshots it's created and stop creating snapshots? And since I don't quite understand why time-sliderd was failing to begin with, I'm nervous about re-enabling it. Do I need to do a zpool upgrade on all my pools to make it work? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss