Re: [zfs-discuss] Zones on shared storage - a warning
[removed zones-discuss after sending heads-up that the conversation will continue at zfs-discuss] On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen cindy.swearin...@sun.com wrote: Hi Mike, It is difficult to comment on the root cause of this failure since the several interactions of these features are unknown. You might consider seeing how Ed's proposal plays out and let him do some more testing... Unfortunately Ed's proposal is not funded last I heard. Ops Center uses many of the same mechanisms for putting zones on ZFS. This is where I saw the problem initially. If you are interested in testing this with NFSv4 and it still fails the same way, then also consider testing this with a local file instead of a NFS-mounted file and let us know the results. I'm also unsure of using the same path for the pool and the zone root path, rather than one path for pool and a pool/dataset path for zone root path. I will test this myself if I get some time. I have been unable to reproduce with a local file. I have been able to reproduce with NFSv4 on build 130. Rather surprisingly the actual checksums found in the ereports are sometimes 0x0 0x0 0x0 0x0 or 0xbaddcafe00 Here's what I did: - Install OpenSolaris build 130 (ldom on T5220) - Mount some NFS space at /nfszone: mount -F nfs -o vers=4 $file:/path /nfszone - Create a 10gig sparse file cd /nfszone mkfile -n 10g root - Create a zpool zpool create -m /zones/nfszone nfszone /nfszone/root - Configure and install a zone zonecfg -z nfszone set zonepath = /zones/nfszone set autoboot = false verify commit exit chmod 700 /zones/nfszone zoneadm -z nfszone install - Verify that the nfszone pool is clean. First, pkg history in the zone shows the timestamp of the last package operation 2010-01-07T20:27:07 install pkg Succeeded At 20:31 I ran: # zpool status nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors I booted the zone. By 20:32 it had accumulated 132 checksum errors: # zpool status nfszone pool: nfszone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone DEGRADED 0 0 0 /nfszone/root DEGRADED 0 0 132 too many errors errors: No known data errors fmdump has some very interesting things to say about the actual checksums. The 0x0 and 0xbaddcafe00 seem to shout that these checksum errors are not due to a couple bits flipped # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 4cksum_actual = 0x0 0x0 0x0 0x0 4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 I halted the zone, exported the pool, imported the pool, then did a scrub. Everything seemed to be OK: # zpool export nfszone # zpool import -d /nfszone nfszone # zpool status nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors # zpool scrub nfszone # zpool status nfszone pool: nfszone state: ONLINE scrub: scrub completed after 0h0m with 0 errors on Thu Jan 7 21:56:47 2010 config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors But then I booted the zone... # zoneadm -z nfszone boot # zpool status nfszone pool: nfszone state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are
Re: [zfs-discuss] Zones on shared storage - a warning
Hi Mike, I can't really speak for how virtualization products are using files for pools, but we don't recommend creating pools on files, much less NFS-mounted files and then building zones on top. File-based pool configurations might be used for limited internal testing of some features, but our product testing does not include testing storage pools on files or NFS-mounted files. Unless Ed's project gets refunded, I'm not sure how much farther you can go with this approach. Thanks, Cindy On 01/07/10 15:05, Mike Gerdts wrote: [removed zones-discuss after sending heads-up that the conversation will continue at zfs-discuss] On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen cindy.swearin...@sun.com wrote: Hi Mike, It is difficult to comment on the root cause of this failure since the several interactions of these features are unknown. You might consider seeing how Ed's proposal plays out and let him do some more testing... Unfortunately Ed's proposal is not funded last I heard. Ops Center uses many of the same mechanisms for putting zones on ZFS. This is where I saw the problem initially. If you are interested in testing this with NFSv4 and it still fails the same way, then also consider testing this with a local file instead of a NFS-mounted file and let us know the results. I'm also unsure of using the same path for the pool and the zone root path, rather than one path for pool and a pool/dataset path for zone root path. I will test this myself if I get some time. I have been unable to reproduce with a local file. I have been able to reproduce with NFSv4 on build 130. Rather surprisingly the actual checksums found in the ereports are sometimes 0x0 0x0 0x0 0x0 or 0xbaddcafe00 Here's what I did: - Install OpenSolaris build 130 (ldom on T5220) - Mount some NFS space at /nfszone: mount -F nfs -o vers=4 $file:/path /nfszone - Create a 10gig sparse file cd /nfszone mkfile -n 10g root - Create a zpool zpool create -m /zones/nfszone nfszone /nfszone/root - Configure and install a zone zonecfg -z nfszone set zonepath = /zones/nfszone set autoboot = false verify commit exit chmod 700 /zones/nfszone zoneadm -z nfszone install - Verify that the nfszone pool is clean. First, pkg history in the zone shows the timestamp of the last package operation 2010-01-07T20:27:07 install pkg Succeeded At 20:31 I ran: # zpool status nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors I booted the zone. By 20:32 it had accumulated 132 checksum errors: # zpool status nfszone pool: nfszone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone DEGRADED 0 0 0 /nfszone/root DEGRADED 0 0 132 too many errors errors: No known data errors fmdump has some very interesting things to say about the actual checksums. The 0x0 and 0xbaddcafe00 seem to shout that these checksum errors are not due to a couple bits flipped # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 4cksum_actual = 0x0 0x0 0x0 0x0 4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 I halted the zone, exported the pool, imported the pool, then did a scrub. Everything seemed to be OK: # zpool export nfszone # zpool import -d /nfszone nfszone # zpool status nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors # zpool scrub nfszone # zpool status nfszone pool: nfszone state: ONLINE
Re: [zfs-discuss] Zones on shared storage - a warning
hey mike/cindy, i've gone ahead and filed a zfs rfe on this functionality: 6915127 need full support for zfs pools on files implmenting this rfe is a requirement for supporting encapsulated zones on shared storage. ed On Thu, Jan 07, 2010 at 03:26:17PM -0700, Cindy Swearingen wrote: Hi Mike, I can't really speak for how virtualization products are using files for pools, but we don't recommend creating pools on files, much less NFS-mounted files and then building zones on top. File-based pool configurations might be used for limited internal testing of some features, but our product testing does not include testing storage pools on files or NFS-mounted files. Unless Ed's project gets refunded, I'm not sure how much farther you can go with this approach. Thanks, Cindy On 01/07/10 15:05, Mike Gerdts wrote: [removed zones-discuss after sending heads-up that the conversation will continue at zfs-discuss] On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen cindy.swearin...@sun.com wrote: Hi Mike, It is difficult to comment on the root cause of this failure since the several interactions of these features are unknown. You might consider seeing how Ed's proposal plays out and let him do some more testing... Unfortunately Ed's proposal is not funded last I heard. Ops Center uses many of the same mechanisms for putting zones on ZFS. This is where I saw the problem initially. If you are interested in testing this with NFSv4 and it still fails the same way, then also consider testing this with a local file instead of a NFS-mounted file and let us know the results. I'm also unsure of using the same path for the pool and the zone root path, rather than one path for pool and a pool/dataset path for zone root path. I will test this myself if I get some time. I have been unable to reproduce with a local file. I have been able to reproduce with NFSv4 on build 130. Rather surprisingly the actual checksums found in the ereports are sometimes 0x0 0x0 0x0 0x0 or 0xbaddcafe00 Here's what I did: - Install OpenSolaris build 130 (ldom on T5220) - Mount some NFS space at /nfszone: mount -F nfs -o vers=4 $file:/path /nfszone - Create a 10gig sparse file cd /nfszone mkfile -n 10g root - Create a zpool zpool create -m /zones/nfszone nfszone /nfszone/root - Configure and install a zone zonecfg -z nfszone set zonepath = /zones/nfszone set autoboot = false verify commit exit chmod 700 /zones/nfszone zoneadm -z nfszone install - Verify that the nfszone pool is clean. First, pkg history in the zone shows the timestamp of the last package operation 2010-01-07T20:27:07 install pkg Succeeded At 20:31 I ran: # zpool status nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors I booted the zone. By 20:32 it had accumulated 132 checksum errors: # zpool status nfszone pool: nfszone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone DEGRADED 0 0 0 /nfszone/root DEGRADED 0 0 132 too many errors errors: No known data errors fmdump has some very interesting things to say about the actual checksums. The 0x0 and 0xbaddcafe00 seem to shout that these checksum errors are not due to a couple bits flipped # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 4cksum_actual = 0x0 0x0 0x0 0x0 4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 I halted the zone, exported the pool, imported the pool, then did a scrub. Everything seemed
Re: [zfs-discuss] Zones on shared storage - a warning
Hi Mike, It is difficult to comment on the root cause of this failure since the several interactions of these features are unknown. You might consider seeing how Ed's proposal plays out and let him do some more testing... If you are interested in testing this with NFSv4 and it still fails the same way, then also consider testing this with a local file instead of a NFS-mounted file and let us know the results. I'm also unsure of using the same path for the pool and the zone root path, rather than one path for pool and a pool/dataset path for zone root path. I will test this myself if I get some time. Thanks, Cindy On 12/22/09 20:34, Mike Gerdts wrote: On Tue, Dec 22, 2009 at 8:02 PM, Mike Gerdts mger...@gmail.com wrote: I've been playing around with zones on NFS a bit and have run into what looks to be a pretty bad snag - ZFS keeps seeing read and/or checksum errors. This exists with S10u8 and OpenSolaris dev build snv_129. This is likely a blocker for anything thinking of implementing parts of Ed's Zones on Shared Storage: http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss The OpenSolaris example appears below. The order of events is: 1) Create a file on NFS, turn it into a zpool 2) Configure a zone with the pool as zonepath 3) Install the zone, verify that the pool is healthy 4) Boot the zone, observe that the pool is sick [snip] An off list conversation and a bit of digging into other tests I have done shows that this is likely limited to NFSv3. I cannot say that this problem has been seen with NFSv4. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zones on shared storage - a warning
I've been playing around with zones on NFS a bit and have run into what looks to be a pretty bad snag - ZFS keeps seeing read and/or checksum errors. This exists with S10u8 and OpenSolaris dev build snv_129. This is likely a blocker for anything thinking of implementing parts of Ed's Zones on Shared Storage: http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss The OpenSolaris example appears below. The order of events is: 1) Create a file on NFS, turn it into a zpool 2) Configure a zone with the pool as zonepath 3) Install the zone, verify that the pool is healthy 4) Boot the zone, observe that the pool is sick r...@soltrain19# mount filer:/path /mnt r...@soltrain19# cd /mnt r...@soltrain19# mkdir osolzone r...@soltrain19# mkfile -n 8g root r...@soltrain19# zpool create -m /zones/osol osol /mnt/osolzone/root r...@soltrain19# zonecfg -z osol osol: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:osol create zonecfg:osol info zonename: osol zonepath: brand: ipkg autoboot: false bootargs: pool: limitpriv: scheduling-class: ip-type: shared hostid: zonecfg:osol set zonepath=/zones/osol zonecfg:osol set autoboot=false zonecfg:osol verify zonecfg:osol commit zonecfg:osol exit r...@soltrain19# chmod 700 /zones/osol r...@soltrain19# zoneadm -z osol install Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/ http://pkg-na-2.opensolaris.org/dev/). Publisher: Using contrib (http://pkg.opensolaris.org/contrib/). Image: Preparing at /zones/osol/root. Cache: Using /var/pkg/download. Sanity Check: Looking for 'entire' incorporation. Installing: Core System (output follows) DOWNLOAD PKGS FILESXFER (MB) Completed46/46 12334/1233493.1/93.1 PHASEACTIONS Install Phase18277/18277 No updates necessary for this image. Installing: Additional Packages (output follows) DOWNLOAD PKGS FILESXFER (MB) Completed36/36 3339/333921.3/21.3 PHASEACTIONS Install Phase 4466/4466 Note: Man pages can be obtained by installing SUNWman Postinstall: Copying SMF seed repository ... done. Postinstall: Applying workarounds. Done: Installation completed in 2139.186 seconds. Next Steps: Boot the zone, then log into the zone console (zlogin -C) to complete the configuration process. 6.3 Boot the OpenSolaris zone r...@soltrain19# zpool status osol pool: osol state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM osol ONLINE 0 0 0 /mnt/osolzone/root ONLINE 0 0 0 errors: No known data errors r...@soltrain19# zoneadm -z osol boot r...@soltrain19# zpool status osol pool: osol state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM osol DEGRADED 0 0 0 /mnt/osolzone/root DEGRADED 0 0 117 too many errors errors: No known data errors r...@soltrain19# zlogin osol uptime 5:31pm up 1 min(s), 0 users, load average: 0.69, 0.38, 0.52 -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zones on shared storage - a warning
On Tue, Dec 22, 2009 at 8:02 PM, Mike Gerdts mger...@gmail.com wrote: I've been playing around with zones on NFS a bit and have run into what looks to be a pretty bad snag - ZFS keeps seeing read and/or checksum errors. This exists with S10u8 and OpenSolaris dev build snv_129. This is likely a blocker for anything thinking of implementing parts of Ed's Zones on Shared Storage: http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss The OpenSolaris example appears below. The order of events is: 1) Create a file on NFS, turn it into a zpool 2) Configure a zone with the pool as zonepath 3) Install the zone, verify that the pool is healthy 4) Boot the zone, observe that the pool is sick [snip] An off list conversation and a bit of digging into other tests I have done shows that this is likely limited to NFSv3. I cannot say that this problem has been seen with NFSv4. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss