ZFS: i/o error - all block copies unavailable on large disk number machines
We did a minor kernel update on a large storage machine here today which runs FreeBSD 8.2 and to our surprise it failed to boot at the loader with ZFS: i/o error - all block copies unavailable. After some digging we discovered that this was likely due to the fact that the BIOS only enumerates the first 12 disks and this machine has more than that in the root zpool which was a striped raidz2 volume. This in turn means that the bootcode can't complete and hence the machine can't boot. Our solution was to migrate the root fs off the raidz2 volume and to a mirrored volume which was on two disks which where accessible from the BIOS. It would of course be nice if zfs warned or even prevented this, any thoughts? More info available if we find it here:- http://blog.multiplay.co.uk/2012/01/zfs-io-error-all-block-copies-unavailable-on-large-disk-number-machines/ Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
On Jan 23, 2012, at 9:04 AM, Steven Hartland wrote: After some digging we discovered that this was likely due to the fact that the BIOS only enumerates the first 12 disks and this machine has more than that in the root zpool which was a striped raidz2 volume. This in turn means that the bootcode can't complete and hence the machine can't boot. As far as I can tell, ZFS best practices guides recommend no more than nine drives in a group/pool. Putting more than that into a pool, much less something you are trying to boot from, seems like a fine experiment to make but is not something which I would rely upon... Regards, -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
On 23/01/2012 18:06, Chuck Swiger wrote: On Jan 23, 2012, at 9:04 AM, Steven Hartland wrote: After some digging we discovered that this was likely due to the fact that the BIOS only enumerates the first 12 disks and this machine has more than that in the root zpool which was a striped raidz2 volume. This in turn means that the bootcode can't complete and hence the machine can't boot. As far as I can tell, ZFS best practices guides recommend no more than nine drives in a group/pool. Putting more than that into a pool, much less something you are trying to boot from, seems like a fine experiment to make but is not something which I would rely upon... Even if you do split up your pool into vdevs using 8 drives, you will still run into the problem with zfs being unable to assemble the pool unless it sees all of the drives in it. Interesting that this only appeared as part of a minor kernel update. I ran into this myself with 8-STABLE, no indication that there was a fix possible by juggling kernels. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
- Original Message - From: Chuck Swiger On Jan 23, 2012, at 9:04 AM, Steven Hartland wrote: After some digging we discovered that this was likely due to the fact that the BIOS only enumerates the first 12 disks and this machine has more than that in the root zpool which was a striped raidz2 volume. This in turn means that the bootcode can't complete and hence the machine can't boot. As far as I can tell, ZFS best practices guides recommend no more than nine drives in a group/pool. Putting more than that into a pool, much less something you are trying to boot from, seems like a fine experiment to make but is not something which I would rely upon... Not something I've seem made clear, but quite possibly. Even with 9 disks you could easily get this if the BIOS doesn't see all of said disks, be that initially or due to disks added to the machine. For reference the original install was done on a zpool with 6 disks in a raidz2 config but then 6 additional disks where added to expand capacity. It was only when the new kernel was installed that data required to boot was then written to disks in the seconds raidz2 which is inaccessible to the boot code even though in perfect working order on a booted system. So something to document, watch out for and potentially safe guard against? It maybe something specific to machines with legacy BIOS hence not an issue with Sun kit? What made it more interesting is the boot code could see the directory structure but clearly not all of the required data. Would it be possible for the boot code to provide a more coherent message in this case? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
- Original Message - From: Matthew Seaman Even if you do split up your pool into vdevs using 8 drives, you will still run into the problem with zfs being unable to assemble the pool unless it sees all of the drives in it. Interesting that this only appeared as part of a minor kernel update. I ran into this myself with 8-STABLE, no indication that there was a fix possible by juggling kernels. Indeed this was nothing to do with the changes in the kernel its purely down to which disks the physical copies of the data lives on within the boot zpool, at least that's what I believe is the key here. For reference the layout here is the following:- pool: tank2 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tank2 ONLINE 0 0 0 raidz2ONLINE 0 0 0 gptid/aad3bd9f-05a2-11e1-8d4a-0025903b854c ONLINE 0 0 0 gptid/abbe61d0-05a2-11e1-8d4a-0025903b854c ONLINE 0 0 0 gptid/aca6dba7-05a2-11e1-8d4a-0025903b854c ONLINE 0 0 0 gptid/ad90c2ba-05a2-11e1-8d4a-0025903b854c ONLINE 0 0 0 gptid/ae773314-05a2-11e1-8d4a-0025903b854c ONLINE 0 0 0 gptid/af5dea39-05a2-11e1-8d4a-0025903b854c ONLINE 0 0 0 raidz2ONLINE 0 0 0 da0p1 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 da2p1 ONLINE 0 0 0 da3p1 ONLINE 0 0 0 gptid/c21396ba-05a5-11e1-bce9-0025903b854c ONLINE 0 0 0 gptid/c21b30b9-05a5-11e1-bce9-0025903b854c ONLINE 0 0 0 cache ada0p3ONLINE 0 0 0 ada1p3ONLINE 0 0 0 spares gptid/4eb3ef4c-05a6-11e1-bce9-0025903b854cAVAIL gptid/c2ba092d-05a5-11e1-bce9-0025903b854cAVAIL Initially the zpool was just the first raidz2. Only after install was the second raidz2 added to increase capacity. So what I believe has happened is the new kernel when installed happens to have data be located on the second raidz2 which consists of disks not available to the BIOS and hence results in all block copies unavailable from the boot code. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
On 23/01/2012 19:29, Steven Hartland wrote: Initially the zpool was just the first raidz2. Only after install was the second raidz2 added to increase capacity. So what I believe has happened is the new kernel when installed happens to have data be located on the second raidz2 which consists of disks not available to the BIOS and hence results in all block copies unavailable from the boot code. Exactly what happened to me. You can run into this in a nasty way -- insert the drives, and expand the zpool on-line, and everything will carry on quite happily. Until you next reboot, when it just won't come back. And you can't undo the expansion of the pool: due to the copy-on-write behaviour of ZFS even overwriting a file in-place stands a 50% chance of being written to the new vdev. In my case, I fixed it by having a separate /boot on some USB sticks -- this was only ever accessed to read the kernel, kernel modules and bootloader at boot time, so no worries over performance. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
Le lundi 23 janvier 2012, Matthew Seaman m.sea...@infracaninophile.co.uk a écrit : On 23/01/2012 19:29, Steven Hartland wrote: Initially the zpool was just the first raidz2. Only after install was the second raidz2 added to increase capacity. So what I believe has happened is the new kernel when installed happens to have data be located on the second raidz2 which consists of disks not available to the BIOS and hence results in all block copies unavailable from the boot code. Exactly what happened to me. You can run into this in a nasty way -- insert the drives, and expand the zpool on-line, and everything will carry on quite happily. Until you next reboot, when it just won't come back. And you can't undo the expansion of the pool: due to the copy-on-write behaviour of ZFS even overwriting a file in-place stands a 50% chance of being written to the new vdev. In my case, I fixed it by having a separate /boot on some USB sticks -- this was only ever accessed to read the kernel, kernel modules and bootloader at boot time, so no worries over performance. Have you tried using a separate /boot on a zfs with copies=2 (or the number of vdevs composing the pool) ? Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: oliv...@gid0.org- against HTML email vCards X www: http://www.gid0.org- against proprietary attachments / \ Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
On Mon, Jan 23, 2012 at 11:21 AM, Steven Hartland kill...@multiplay.co.uk wrote: Not something I've seem made clear, but quite possibly. Even with 9 disks you could easily get this if the BIOS doesn't see all of said disks, be that initially or due to disks added to the machine. For reference the original install was done on a zpool with 6 disks in a raidz2 config but then 6 additional disks where added to expand capacity. It was only when the new kernel was installed that data required to boot was then written to disks in the seconds raidz2 which is inaccessible to the boot code even though in perfect working order on a booted system. So something to document, watch out for and potentially safe guard against? It maybe something specific to machines with legacy BIOS hence not an issue with Sun kit? From what I've gathered on the zfs-discuss mailing list, Solaris only supports rpool's (bootable pool) to use mirror vdevs, and only a single vdev in the rpool. FreeBSD is (AFAIK) the only ZFS implementation that supports booting from a raidz vdev, and from a pool with multiple raidz vdevs. IME, separating the bootable disks from the storage disks will always save you time, effort, and grief in the long run. :) Whether that means using a separate UFS / filesystem, or a mirrored set of disks for /, or a separate ZFS pool with a single mirror vdev is up to the admin. But boot/OS should be separate from bulk storage. :) -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
- Original Message - From: Olivier Smedts In my case, I fixed it by having a separate /boot on some USB sticks -- this was only ever accessed to read the kernel, kernel modules and bootloader at boot time, so no worries over performance. Out of interest whats the procedure you used for that Matthew? Have you tried using a separate /boot on a zfs with copies=2 (or the number of vdevs composing the pool) ? Interesting idea but this assumes that the insertion of the new disks doesn't affect the visability of the old disk. While this may sound like it shouldn't happen I can see the case where this may well be the case e.g. adding disks to a multi controller system where the id's of the added disks are lower and hence preferenced by the BIOS. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS: i/o error - all block copies unavailable on large disk number machines
On 23/01/2012 20:47, Steven Hartland wrote: In my case, I fixed it by having a separate /boot on some USB sticks -- this was only ever accessed to read the kernel, kernel modules and bootloader at boot time, so no worries over performance. Out of interest whats the procedure you used for that Matthew? It was basically this: http://wiki.freebsd.org/RootOnZFS/UFSBoot but I already had a system installed on the ZFS, so Section 2 of that was pretty much a no-op for me. I built the USB sticks offline, then went into the datacenter with them to revive that server. I can't remember now whether I just had two bootable USB sticks -- one to use, and one as a spare -- or whether I'd made a gmirror from the two. Seeing as they were something like £8 apiece having several on hand was easily justifiable. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature