Re: [zfs-discuss] Zones on shared storage - a warning

2010-01-07 Thread Mike Gerdts
[removed zones-discuss after sending heads-up that the conversation
will continue at zfs-discuss]

On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen
cindy.swearin...@sun.com wrote:
 Hi Mike,

 It is difficult to comment on the root cause of this failure since
 the several interactions of these features are unknown. You might
 consider seeing how Ed's proposal plays out and let him do some more
 testing...

Unfortunately Ed's proposal is not funded last I heard.  Ops Center
uses many of the same mechanisms for putting zones on ZFS.  This is
where I saw the problem initially.

 If you are interested in testing this with NFSv4 and it still fails
 the same way, then also consider testing this with a local file
 instead of a NFS-mounted file and let us know the results. I'm also
 unsure of using the same path for the pool and the zone root path,
 rather than one path for pool and a pool/dataset path for zone
 root path. I will test this myself if I get some time.

I have been unable to reproduce with a local file.  I have been able
to reproduce with NFSv4 on build 130.  Rather surprisingly the actual
checksums found in the ereports are sometimes 0x0 0x0 0x0 0x0 or
0xbaddcafe00 

Here's what I did:

- Install OpenSolaris build 130 (ldom on T5220)
- Mount some NFS space at /nfszone:
   mount -F nfs -o vers=4 $file:/path /nfszone
- Create a 10gig sparse file
   cd /nfszone
   mkfile -n 10g root
- Create a zpool
   zpool create -m /zones/nfszone nfszone /nfszone/root
- Configure and install a zone
   zonecfg -z nfszone
set zonepath = /zones/nfszone
set autoboot = false
verify
commit
exit
   chmod 700 /zones/nfszone
   zoneadm -z nfszone install

- Verify that the nfszone pool is clean.  First, pkg history in the
zone shows the timestamp of the last package operation

  2010-01-07T20:27:07 install   pkg Succeeded

At 20:31 I ran:

# zpool status nfszone
  pool: nfszone
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nfszone  ONLINE   0 0 0
  /nfszone/root  ONLINE   0 0 0

errors: No known data errors

I booted the zone.  By 20:32 it had accumulated 132 checksum errors:

 # zpool status nfszone
  pool: nfszone
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nfszone  DEGRADED 0 0 0
  /nfszone/root  DEGRADED 0 0   132  too many errors

errors: No known data errors

fmdump has some very interesting things to say about the actual
checksums.  The 0x0 and 0xbaddcafe00 seem to shout that these checksum
errors are not due to a couple bits flipped

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail
   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62
0x290cbce13fc59dce
   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400
0x7e0aef335f0c7f00
   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800
0xd4f1025a8e66fe00
   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73
0x4e0b3a8747b8a8
   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00
0x280934efa6d20f40
   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00
0x89715e34fbf9cdc0
  16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00
0x7f84b11b3fc7f80
  48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500
0x82804bc6ebcfc0

I halted the zone, exported the pool, imported the pool, then did a
scrub.  Everything seemed to be OK:

# zpool export nfszone
# zpool import -d /nfszone nfszone
# zpool status nfszone
  pool: nfszone
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nfszone  ONLINE   0 0 0
  /nfszone/root  ONLINE   0 0 0

errors: No known data errors
# zpool scrub nfszone
# zpool status nfszone
  pool: nfszone
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Thu Jan  7 21:56:47 2010
config:

NAME STATE READ WRITE CKSUM
nfszone  ONLINE   0 0 0
  /nfszone/root  ONLINE   0 0 0

errors: No known data errors

But then I booted the zone...

# zoneadm -z nfszone boot
# zpool status nfszone
  pool: nfszone
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 

Re: [zfs-discuss] Zones on shared storage - a warning

2010-01-07 Thread Cindy Swearingen

Hi Mike,

I can't really speak for how virtualization products are using
files for pools, but we don't recommend creating pools on files,
much less NFS-mounted files and then building zones on top.

File-based pool configurations might be used for limited internal
testing of some features, but our product testing does not include
testing storage pools on files or NFS-mounted files.

Unless Ed's project gets refunded, I'm not sure how much farther
you can go with this approach.

Thanks,

Cindy

On 01/07/10 15:05, Mike Gerdts wrote:

[removed zones-discuss after sending heads-up that the conversation
will continue at zfs-discuss]

On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen
cindy.swearin...@sun.com wrote:

Hi Mike,

It is difficult to comment on the root cause of this failure since
the several interactions of these features are unknown. You might
consider seeing how Ed's proposal plays out and let him do some more
testing...


Unfortunately Ed's proposal is not funded last I heard.  Ops Center
uses many of the same mechanisms for putting zones on ZFS.  This is
where I saw the problem initially.


If you are interested in testing this with NFSv4 and it still fails
the same way, then also consider testing this with a local file
instead of a NFS-mounted file and let us know the results. I'm also
unsure of using the same path for the pool and the zone root path,
rather than one path for pool and a pool/dataset path for zone
root path. I will test this myself if I get some time.


I have been unable to reproduce with a local file.  I have been able
to reproduce with NFSv4 on build 130.  Rather surprisingly the actual
checksums found in the ereports are sometimes 0x0 0x0 0x0 0x0 or
0xbaddcafe00 

Here's what I did:

- Install OpenSolaris build 130 (ldom on T5220)
- Mount some NFS space at /nfszone:
   mount -F nfs -o vers=4 $file:/path /nfszone
- Create a 10gig sparse file
   cd /nfszone
   mkfile -n 10g root
- Create a zpool
   zpool create -m /zones/nfszone nfszone /nfszone/root
- Configure and install a zone
   zonecfg -z nfszone
set zonepath = /zones/nfszone
set autoboot = false
verify
commit
exit
   chmod 700 /zones/nfszone
   zoneadm -z nfszone install

- Verify that the nfszone pool is clean.  First, pkg history in the
zone shows the timestamp of the last package operation

  2010-01-07T20:27:07 install   pkg Succeeded

At 20:31 I ran:

# zpool status nfszone
  pool: nfszone
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nfszone  ONLINE   0 0 0
  /nfszone/root  ONLINE   0 0 0

errors: No known data errors

I booted the zone.  By 20:32 it had accumulated 132 checksum errors:

 # zpool status nfszone
  pool: nfszone
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nfszone  DEGRADED 0 0 0
  /nfszone/root  DEGRADED 0 0   132  too many errors

errors: No known data errors

fmdump has some very interesting things to say about the actual
checksums.  The 0x0 and 0xbaddcafe00 seem to shout that these checksum
errors are not due to a couple bits flipped

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail
   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62
0x290cbce13fc59dce
   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400
0x7e0aef335f0c7f00
   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800
0xd4f1025a8e66fe00
   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73
0x4e0b3a8747b8a8
   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00
0x280934efa6d20f40
   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00
0x89715e34fbf9cdc0
  16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00
0x7f84b11b3fc7f80
  48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500
0x82804bc6ebcfc0

I halted the zone, exported the pool, imported the pool, then did a
scrub.  Everything seemed to be OK:

# zpool export nfszone
# zpool import -d /nfszone nfszone
# zpool status nfszone
  pool: nfszone
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nfszone  ONLINE   0 0 0
  /nfszone/root  ONLINE   0 0 0

errors: No known data errors
# zpool scrub nfszone
# zpool status nfszone
  pool: nfszone
 state: ONLINE

Re: [zfs-discuss] Zones on shared storage - a warning

2010-01-07 Thread Edward Pilatowicz
hey mike/cindy,

i've gone ahead and filed a zfs rfe on this functionality:
6915127 need full support for zfs pools on files

implmenting this rfe is a requirement for supporting encapsulated
zones on shared storage.

ed

On Thu, Jan 07, 2010 at 03:26:17PM -0700, Cindy Swearingen wrote:
 Hi Mike,

 I can't really speak for how virtualization products are using
 files for pools, but we don't recommend creating pools on files,
 much less NFS-mounted files and then building zones on top.

 File-based pool configurations might be used for limited internal
 testing of some features, but our product testing does not include
 testing storage pools on files or NFS-mounted files.

 Unless Ed's project gets refunded, I'm not sure how much farther
 you can go with this approach.

 Thanks,

 Cindy

 On 01/07/10 15:05, Mike Gerdts wrote:
 [removed zones-discuss after sending heads-up that the conversation
 will continue at zfs-discuss]
 
 On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen
 cindy.swearin...@sun.com wrote:
 Hi Mike,
 
 It is difficult to comment on the root cause of this failure since
 the several interactions of these features are unknown. You might
 consider seeing how Ed's proposal plays out and let him do some more
 testing...
 
 Unfortunately Ed's proposal is not funded last I heard.  Ops Center
 uses many of the same mechanisms for putting zones on ZFS.  This is
 where I saw the problem initially.
 
 If you are interested in testing this with NFSv4 and it still fails
 the same way, then also consider testing this with a local file
 instead of a NFS-mounted file and let us know the results. I'm also
 unsure of using the same path for the pool and the zone root path,
 rather than one path for pool and a pool/dataset path for zone
 root path. I will test this myself if I get some time.
 
 I have been unable to reproduce with a local file.  I have been able
 to reproduce with NFSv4 on build 130.  Rather surprisingly the actual
 checksums found in the ereports are sometimes 0x0 0x0 0x0 0x0 or
 0xbaddcafe00 
 
 Here's what I did:
 
 - Install OpenSolaris build 130 (ldom on T5220)
 - Mount some NFS space at /nfszone:
mount -F nfs -o vers=4 $file:/path /nfszone
 - Create a 10gig sparse file
cd /nfszone
mkfile -n 10g root
 - Create a zpool
zpool create -m /zones/nfszone nfszone /nfszone/root
 - Configure and install a zone
zonecfg -z nfszone
 set zonepath = /zones/nfszone
 set autoboot = false
 verify
 commit
 exit
chmod 700 /zones/nfszone
zoneadm -z nfszone install
 
 - Verify that the nfszone pool is clean.  First, pkg history in the
 zone shows the timestamp of the last package operation
 
   2010-01-07T20:27:07 install   pkg Succeeded
 
 At 20:31 I ran:
 
 # zpool status nfszone
   pool: nfszone
  state: ONLINE
  scrub: none requested
 config:
 
 NAME STATE READ WRITE CKSUM
 nfszone  ONLINE   0 0 0
   /nfszone/root  ONLINE   0 0 0
 
 errors: No known data errors
 
 I booted the zone.  By 20:32 it had accumulated 132 checksum errors:
 
  # zpool status nfszone
   pool: nfszone
  state: DEGRADED
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
 config:
 
 NAME STATE READ WRITE CKSUM
 nfszone  DEGRADED 0 0 0
   /nfszone/root  DEGRADED 0 0   132  too many errors
 
 errors: No known data errors
 
 fmdump has some very interesting things to say about the actual
 checksums.  The 0x0 and 0xbaddcafe00 seem to shout that these checksum
 errors are not due to a couple bits flipped
 
 # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail
2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62
 0x290cbce13fc59dce
3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400
 0x7e0aef335f0c7f00
3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800
 0xd4f1025a8e66fe00
4cksum_actual = 0x0 0x0 0x0 0x0
4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900
 0x330107da7c4bcec0
5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73
 0x4e0b3a8747b8a8
6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00
 0x280934efa6d20f40
6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00
 0x89715e34fbf9cdc0
   16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00
 0x7f84b11b3fc7f80
   48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500
 0x82804bc6ebcfc0
 
 I halted the zone, exported the pool, imported the pool, then did a
 scrub.  Everything seemed 

Re: [zfs-discuss] Zones on shared storage - a warning

2010-01-04 Thread Cindy Swearingen

Hi Mike,

It is difficult to comment on the root cause of this failure since
the several interactions of these features are unknown. You might
consider seeing how Ed's proposal plays out and let him do some more
testing...

If you are interested in testing this with NFSv4 and it still fails
the same way, then also consider testing this with a local file
instead of a NFS-mounted file and let us know the results. I'm also
unsure of using the same path for the pool and the zone root path,
rather than one path for pool and a pool/dataset path for zone
root path. I will test this myself if I get some time.

Thanks,

Cindy

On 12/22/09 20:34, Mike Gerdts wrote:

On Tue, Dec 22, 2009 at 8:02 PM, Mike Gerdts mger...@gmail.com wrote:

I've been playing around with zones on NFS a bit and have run into
what looks to be a pretty bad snag - ZFS keeps seeing read and/or
checksum errors.  This exists with S10u8 and OpenSolaris dev build
snv_129.  This is likely a blocker for anything thinking of
implementing parts of Ed's Zones on Shared Storage:

http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss

The OpenSolaris example appears below.  The order of events is:

1) Create a file on NFS, turn it into a zpool
2) Configure a zone with the pool as zonepath
3) Install the zone, verify that the pool is healthy
4) Boot the zone, observe that the pool is sick

[snip]

An off list conversation and a bit of digging into other tests I have
done shows that this is likely limited to NFSv3.  I cannot say that
this problem has been seen with NFSv4.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zones on shared storage - a warning

2009-12-22 Thread Mike Gerdts
I've been playing around with zones on NFS a bit and have run into
what looks to be a pretty bad snag - ZFS keeps seeing read and/or
checksum errors.  This exists with S10u8 and OpenSolaris dev build
snv_129.  This is likely a blocker for anything thinking of
implementing parts of Ed's Zones on Shared Storage:

http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss

The OpenSolaris example appears below.  The order of events is:

1) Create a file on NFS, turn it into a zpool
2) Configure a zone with the pool as zonepath
3) Install the zone, verify that the pool is healthy
4) Boot the zone, observe that the pool is sick

r...@soltrain19# mount filer:/path /mnt
r...@soltrain19# cd /mnt
r...@soltrain19# mkdir osolzone
r...@soltrain19# mkfile -n 8g root
r...@soltrain19# zpool create -m /zones/osol osol /mnt/osolzone/root
r...@soltrain19# zonecfg -z osol
osol: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:osol create
zonecfg:osol info
zonename: osol
zonepath:
brand: ipkg
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
hostid:
zonecfg:osol set zonepath=/zones/osol
zonecfg:osol set autoboot=false
zonecfg:osol verify
zonecfg:osol commit
zonecfg:osol exit

r...@soltrain19# chmod 700 /zones/osol

r...@soltrain19# zoneadm -z osol install
   Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/
http://pkg-na-2.opensolaris.org/dev/).
   Publisher: Using contrib (http://pkg.opensolaris.org/contrib/).
   Image: Preparing at /zones/osol/root.
   Cache: Using /var/pkg/download.
Sanity Check: Looking for 'entire' incorporation.
  Installing: Core System (output follows)
DOWNLOAD  PKGS   FILESXFER (MB)
Completed46/46 12334/1233493.1/93.1

PHASEACTIONS
Install Phase18277/18277
No updates necessary for this image.
  Installing: Additional Packages (output follows)
DOWNLOAD  PKGS   FILESXFER (MB)
Completed36/36   3339/333921.3/21.3

PHASEACTIONS
Install Phase  4466/4466

Note: Man pages can be obtained by installing SUNWman
 Postinstall: Copying SMF seed repository ... done.
 Postinstall: Applying workarounds.
Done: Installation completed in 2139.186 seconds.

  Next Steps: Boot the zone, then log into the zone console (zlogin -C)
  to complete the configuration process.
6.3 Boot the OpenSolaris zone
r...@soltrain19# zpool status osol
  pool: osol
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
osol  ONLINE   0 0 0
  /mnt/osolzone/root  ONLINE   0 0 0

errors: No known data errors

r...@soltrain19# zoneadm -z osol boot

r...@soltrain19# zpool status osol
  pool: osol
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
osol  DEGRADED 0 0 0
  /mnt/osolzone/root  DEGRADED 0 0   117  too many errors

errors: No known data errors

r...@soltrain19# zlogin osol uptime
  5:31pm  up 1 min(s),  0 users,  load average: 0.69, 0.38, 0.52


-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zones on shared storage - a warning

2009-12-22 Thread Mike Gerdts
On Tue, Dec 22, 2009 at 8:02 PM, Mike Gerdts mger...@gmail.com wrote:
 I've been playing around with zones on NFS a bit and have run into
 what looks to be a pretty bad snag - ZFS keeps seeing read and/or
 checksum errors.  This exists with S10u8 and OpenSolaris dev build
 snv_129.  This is likely a blocker for anything thinking of
 implementing parts of Ed's Zones on Shared Storage:

 http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss

 The OpenSolaris example appears below.  The order of events is:

 1) Create a file on NFS, turn it into a zpool
 2) Configure a zone with the pool as zonepath
 3) Install the zone, verify that the pool is healthy
 4) Boot the zone, observe that the pool is sick
[snip]

An off list conversation and a bit of digging into other tests I have
done shows that this is likely limited to NFSv3.  I cannot say that
this problem has been seen with NFSv4.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss