Hi Bill,

I think the root cause of this problem is that time slider implemented
the zfs destroy -d feature but this feature is only available in later
pool versions. This means that the routine removal of time slider generated snapshots fails on older pool versions.

The zfs destroy -d feature (snapshot user holds) was introduced in pool version 18.

I think this bug describes some or all of the problem:

https://defect.opensolaris.org/bz/show_bug.cgi?id=16361

Thanks,

Cindy



On 02/18/11 12:34, Bill Shannon wrote:
In the last few days my performance has gone to hell.  I'm running:

# uname -a
SunOS nissan 5.11 snv_150 i86pc i386 i86pc

(I'll upgrade as soon as the desktop hang bug is fixed.)

The performance problems seem to be due to excessive I/O on the main
disk/pool.

The only things I've changed recently is that I've created and destroyed
a snapshot, and I used "zpool upgrade".

Here's what I'm seeing:

# zpool iostat rpool 5
                 capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       13.3G   807M      7     85  15.9K   548K
rpool       13.3G   807M      3     89  1.60K   723K
rpool       13.3G   810M      5     91  5.19K   741K
rpool       13.3G   810M      3     94  2.59K   756K

Using iofileb.d from the dtrace toolkit shows:

# iofileb.d
Tracing... Hit Ctrl-C to end.
^C
     PID CMD              KB FILE
       0 sched             6 <none>
       5 zpool-rpool    7770 <none>

zpool status doesn't show any problems:

# zpool status rpool
    pool: rpool
   state: ONLINE
   scan: none requested
config:

          NAME        STATE     READ WRITE CKSUM
          rpool       ONLINE       0     0     0
            c3d0s0    ONLINE       0     0     0


Perhaps related to this or perhaps not, I discovered recently that time-sliderd was doing just a ton of "close" requests. I disabled time-sliderd while trying
to solve my performance problem.

I was also getting these error messages in the time-sliderd log file:

Warning: Cleanup failed to destroy: rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01
Details:
['/usr/bin/pfexec', '/usr/sbin/zfs', 'destroy', '-d', 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01'] failed with exit code 1 cannot destroy 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01': unsupported version

That was the reason I did the zpool upgrade.

I discovered that I had a *ton* of snapshots from time-slider that
hadn't been destroyed, over 6500 of them, presumably all because of this
version problem?

I manually removed all the snapshots and my performance returned to normal.

I don't quite understand what the "-d" option to "zfs destroy" does.
Why does time-sliderd use it, and why does it prevent these snapshots
from being destroyed?

Shouldn't time-sliderd detect that it can't destroy any of the snapshots
it's created and stop creating snapshots?

And since I don't quite understand why time-sliderd was failing to begin with,
I'm nervous about re-enabling it.  Do I need to do a "zpool upgrade" on all
my pools to make it work?
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to