[zfs-discuss] time-sliderd doesn't remove snapshots

2011-02-18 Thread Bill Shannon

In the last few days my performance has gone to hell.  I'm running:

# uname -a
SunOS nissan 5.11 snv_150 i86pc i386 i86pc

(I'll upgrade as soon as the desktop hang bug is fixed.)

The performance problems seem to be due to excessive I/O on the main
disk/pool.

The only things I've changed recently is that I've created and destroyed
a snapshot, and I used zpool upgrade.

Here's what I'm seeing:

# zpool iostat rpool 5
 capacity operationsbandwidth
poolalloc   free   read  write   read  write
--  -  -  -  -  -  -
rpool   13.3G   807M  7 85  15.9K   548K
rpool   13.3G   807M  3 89  1.60K   723K
rpool   13.3G   810M  5 91  5.19K   741K
rpool   13.3G   810M  3 94  2.59K   756K

Using iofileb.d from the dtrace toolkit shows:

# iofileb.d
Tracing... Hit Ctrl-C to end.
^C
 PID CMD  KB FILE
   0 sched 6 none
   5 zpool-rpool7770 none

zpool status doesn't show any problems:

# zpool status rpool
pool: rpool
   state: ONLINE
   scan: none requested
config:

  NAMESTATE READ WRITE CKSUM
  rpool   ONLINE   0 0 0
c3d0s0ONLINE   0 0 0


Perhaps related to this or perhaps not, I discovered recently that time-sliderd
was doing just a ton of close requests.  I disabled time-sliderd while trying
to solve my performance problem.

I was also getting these error messages in the time-sliderd log file:

Warning: Cleanup failed to destroy: 
rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01
Details:
['/usr/bin/pfexec', '/usr/sbin/zfs', 'destroy', '-d', 
'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01'] failed with exit code 1
cannot destroy 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01': unsupported 
version


That was the reason I did the zpool upgrade.

I discovered that I had a *ton* of snapshots from time-slider that
hadn't been destroyed, over 6500 of them, presumably all because of this
version problem?

I manually removed all the snapshots and my performance returned to normal.

I don't quite understand what the -d option to zfs destroy does.
Why does time-sliderd use it, and why does it prevent these snapshots
from being destroyed?

Shouldn't time-sliderd detect that it can't destroy any of the snapshots
it's created and stop creating snapshots?

And since I don't quite understand why time-sliderd was failing to begin with,
I'm nervous about re-enabling it.  Do I need to do a zpool upgrade on all
my pools to make it work?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] time-sliderd doesn't remove snapshots

2011-02-18 Thread Cindy Swearingen

Hi Bill,

I think the root cause of this problem is that time slider implemented
the zfs destroy -d feature but this feature is only available in later
pool versions. This means that the routine removal of time slider 
generated snapshots fails on older pool versions.


The zfs destroy -d feature (snapshot user holds) was introduced in pool 
version 18.


I think this bug describes some or all of the problem:

https://defect.opensolaris.org/bz/show_bug.cgi?id=16361

Thanks,

Cindy



On 02/18/11 12:34, Bill Shannon wrote:

In the last few days my performance has gone to hell.  I'm running:

# uname -a
SunOS nissan 5.11 snv_150 i86pc i386 i86pc

(I'll upgrade as soon as the desktop hang bug is fixed.)

The performance problems seem to be due to excessive I/O on the main
disk/pool.

The only things I've changed recently is that I've created and destroyed
a snapshot, and I used zpool upgrade.

Here's what I'm seeing:

# zpool iostat rpool 5
 capacity operationsbandwidth
poolalloc   free   read  write   read  write
--  -  -  -  -  -  -
rpool   13.3G   807M  7 85  15.9K   548K
rpool   13.3G   807M  3 89  1.60K   723K
rpool   13.3G   810M  5 91  5.19K   741K
rpool   13.3G   810M  3 94  2.59K   756K

Using iofileb.d from the dtrace toolkit shows:

# iofileb.d
Tracing... Hit Ctrl-C to end.
^C
 PID CMD  KB FILE
   0 sched 6 none
   5 zpool-rpool7770 none

zpool status doesn't show any problems:

# zpool status rpool
pool: rpool
   state: ONLINE
   scan: none requested
config:

  NAMESTATE READ WRITE CKSUM
  rpool   ONLINE   0 0 0
c3d0s0ONLINE   0 0 0


Perhaps related to this or perhaps not, I discovered recently that 
time-sliderd
was doing just a ton of close requests.  I disabled time-sliderd while 
trying

to solve my performance problem.

I was also getting these error messages in the time-sliderd log file:

Warning: Cleanup failed to destroy: 
rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01

Details:
['/usr/bin/pfexec', '/usr/sbin/zfs', 'destroy', '-d', 
'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01'] failed with exit code 1
cannot destroy 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01': 
unsupported version


That was the reason I did the zpool upgrade.

I discovered that I had a *ton* of snapshots from time-slider that
hadn't been destroyed, over 6500 of them, presumably all because of this
version problem?

I manually removed all the snapshots and my performance returned to normal.

I don't quite understand what the -d option to zfs destroy does.
Why does time-sliderd use it, and why does it prevent these snapshots
from being destroyed?

Shouldn't time-sliderd detect that it can't destroy any of the snapshots
it's created and stop creating snapshots?

And since I don't quite understand why time-sliderd was failing to begin 
with,

I'm nervous about re-enabling it.  Do I need to do a zpool upgrade on all
my pools to make it work?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] time-sliderd doesn't remove snapshots

2011-02-18 Thread Bill Shannon

One of my old pools was version 10, another was version 13.
I guess that explains the problem.

Seems like time-sliderd should refuse to run on pools that
aren't of a sufficient version.


Cindy Swearingen wrote on 02/18/11 12:07 PM:

Hi Bill,

I think the root cause of this problem is that time slider implemented
the zfs destroy -d feature but this feature is only available in later
pool versions. This means that the routine removal of time slider
generated snapshots fails on older pool versions.

The zfs destroy -d feature (snapshot user holds) was introduced in pool
version 18.

I think this bug describes some or all of the problem:

https://defect.opensolaris.org/bz/show_bug.cgi?id=16361

Thanks,

Cindy



On 02/18/11 12:34, Bill Shannon wrote:

In the last few days my performance has gone to hell.  I'm running:

# uname -a
SunOS nissan 5.11 snv_150 i86pc i386 i86pc

(I'll upgrade as soon as the desktop hang bug is fixed.)

The performance problems seem to be due to excessive I/O on the main
disk/pool.

The only things I've changed recently is that I've created and destroyed
a snapshot, and I used zpool upgrade.

Here's what I'm seeing:

# zpool iostat rpool 5
  capacity operationsbandwidth
poolalloc   free   read  write   read  write
--  -  -  -  -  -  -
rpool   13.3G   807M  7 85  15.9K   548K
rpool   13.3G   807M  3 89  1.60K   723K
rpool   13.3G   810M  5 91  5.19K   741K
rpool   13.3G   810M  3 94  2.59K   756K

Using iofileb.d from the dtrace toolkit shows:

# iofileb.d
Tracing... Hit Ctrl-C to end.
^C
  PID CMD  KB FILE
0 sched 6none
5 zpool-rpool7770none

zpool status doesn't show any problems:

# zpool status rpool
 pool: rpool
state: ONLINE
scan: none requested
config:

   NAMESTATE READ WRITE CKSUM
   rpool   ONLINE   0 0 0
 c3d0s0ONLINE   0 0 0


Perhaps related to this or perhaps not, I discovered recently that
time-sliderd
was doing just a ton of close requests.  I disabled time-sliderd while
trying
to solve my performance problem.

I was also getting these error messages in the time-sliderd log file:

Warning: Cleanup failed to destroy:
rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01
Details:
['/usr/bin/pfexec', '/usr/sbin/zfs', 'destroy', '-d',
'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01'] failed with exit code 1
cannot destroy 'rpool/ROOT@zfs-auto-snap_hourly-2010-11-10-15h01':
unsupported version

That was the reason I did the zpool upgrade.

I discovered that I had a *ton* of snapshots from time-slider that
hadn't been destroyed, over 6500 of them, presumably all because of this
version problem?

I manually removed all the snapshots and my performance returned to normal.

I don't quite understand what the -d option to zfs destroy does.
Why does time-sliderd use it, and why does it prevent these snapshots
from being destroyed?

Shouldn't time-sliderd detect that it can't destroy any of the snapshots
it's created and stop creating snapshots?

And since I don't quite understand why time-sliderd was failing to begin
with,
I'm nervous about re-enabling it.  Do I need to do a zpool upgrade on all
my pools to make it work?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss