Re: [zfs-discuss] trouble adding log and cache on SSD to a pool
On Thu, Aug 04, 2011 at 11:58:47PM +0200, Eugen Leitl wrote: On Thu, Aug 04, 2011 at 02:43:30PM -0700, Larry Liu wrote: root@nexenta:/export/home/eugen# zpool add tank log /dev/dsk/c3d1p0 You should use c3d1s0 here. Th root@nexenta:/export/home/eugen# zpool add tank cache /dev/dsk/c3d1p1 Use c3d1s1. Thanks, that did the trick! root@nexenta:/export/home/eugen# zpool status tank pool: tank state: ONLINE scan: scrub repaired 0 in 0h0m with 0 errors on Fri Aug 5 03:04:57 2011 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 logs c3d1s0ONLINE 0 0 0 cache c3d1s1ONLINE 0 0 0 errors: No known data errors Hmm, it doesn't seem to last more than a couple hours under test load (mapped as a CIFS share receiving a bittorrent download with 10 k small files in it at about 10 MByte/s) before falling from the pool: root@nexenta:/export/home/eugen# zpool status tank pool: tank state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: none requested config: NAMESTATE READ WRITE CKSUM tankDEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 logs c3d1s0FAULTED 0 4 0 too many errors cache c3d1s1FAULTED 13 7.68K 0 too many errors errors: No known data errors dmesg sez Aug 5 05:53:26 nexenta EVENT-TIME: Fri Aug 5 05:53:26 CEST 2011 Aug 5 05:53:26 nexenta PLATFORM: ProLiant-MicroServer, CSN: CN7051P024, HOSTNAME: nexenta Aug 5 05:53:26 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Aug 5 05:53:26 nexenta EVENT-ID: 516e9c7c-9e29-c504-a422-db37838fa676 Aug 5 05:53:26 nexenta DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. Aug 5 05:53:26 nexenta AUTO-RESPONSE: No automated response will occur. Aug 5 05:53:26 nexenta IMPACT: Fault tolerance of the pool may be compromised. Aug 5 05:53:26 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad device. Aug 5 05:53:39 nexenta fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Aug 5 05:53:39 nexenta EVENT-TIME: Fri Aug 5 05:53:39 CEST 2011 Aug 5 05:53:39 nexenta PLATFORM: ProLiant-MicroServer, CSN: CN7051P024, HOSTNAME: nexenta Aug 5 05:53:39 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Aug 5 05:53:39 nexenta EVENT-ID: 3319749a-b6f7-c305-ec86-d94897dde85b Aug 5 05:53:39 nexenta DESC: The number of I/O errors associated with a ZFS device exceeded Aug 5 05:53:39 nexenta acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Aug 5 05:53:39 nexenta AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Aug 5 05:53:39 nexenta will be made to activate a hot spare if available. Aug 5 05:53:39 nexenta IMPACT: Fault tolerance of the pool may be compromised. Aug 5 05:53:39 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad device. -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! assertion failed: zvol_get_stats(os, nv) == 0
System: snv_151a 64 bit on Intel. Error: panic[cpu0] assertion failed: zvol_get_stats(os, nv) == 0, file: ../../common/fs/zfs/zfs_ioctl.c, line: 1815 Failure first seen on Solaris 10, update 8 History: I recently received two 320G drives and realized from reading this list it would have been better if I would have done the install on the small drives but I didn't have them at the time. I added the two 320G drives and created tank mirror. I moved some data from other sources to the tank and then decided to go ahead and do a new install. In preparation for that I moved all the data I wanted to save onto the rpool mirror and then installed Solaris 10 update 8 again on the 320G drives. When my system rebooted after the installation, I saw for some reason it used my tank pool as root. I realize now since it was originally a root pool and had boot blocks this didn't help. Anyway I shut down, changed the boot order and then booted into my system. It paniced when trying to access the tank and instantly rebooted. I had to go through this several times until I caught a glimpse of one of the first messages: assertion failed: zvol_get_stats(os, nv) Here is what my system looks like when I boot into failsafe mode. # zpool import pool: rpool id: 16453600103421700325 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: rpool ONLINE mirror ONLINE c0t2d0s0 ONLINE c0t3d0s0 ONLINE pool: tank id: 12861119534757646169 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: tank ONLINE mirror ONLINE c0t0d0s0 ONLINE c0t1d0s0 ONLINE # zpool import tank cannot import 'tank': pool may be in use from other system use '-f' to import anyway I installed Solaris 11 Express USB via Hiroshi-san's Windows tool. Unfortunately it also panics trying to import the pool although zpool import shows the pool online with no errors just like in the above doc. http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ and here is an eerily identical photo capture made by somebody with a similar/identical error. http://prestonconnors.com/zvol_get_stats.jpg At first I thought it was a copy of my screenshot but I see his terminal is white and mine is black. Looks like the problem has been around since 2009 although my problem is with a newly created mirror pool that had plenty of space available (200G in use out of about 500G) and no snapshots were taken. Similar discussion with discouraging lack of follow up: http://opensolaris.org/jive/message.jspa?messageID=376366 Looks like the defect, it's closed and I see no resolution. https://defect.opensolaris.org/bz/show_bug.cgi?id=5682 I have about 200G of data on the tank pool, about 100G or so I don't have anywhere else. I created this pool specifically to make a safe place to store data that I had accumulated over several years and didn't have organized yet. I can't believe such a serious bug has been around for two years and hasn't been fixed. Can somebody please help me get this data back? Thank you. Jim I joined the forums but I didn't see my post on zfs-discuss mailing list which seems alot more active than the forum. Sorry if this is a duplicate for people on the mailing list. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] trouble adding log and cache on SSD to a pool
I think I've found the source of my problem: I need to reflash the N36L BIOS to a hacked russian version (sic) which allows AHCI in the 5th drive bay http://terabyt.es/2011/07/02/nas-build-guide-hp-n36l-microserver-with-nexenta-napp-it/ ... Update BIOS and install hacked Russian BIOS The HP BIOS for N36L does not support anything but legacy IDE emulation on the internal ODD SATA port and the external eSATA port. This is a problem for Nexenta which can detect false disk errors when using the ODD drive on emulated IDE mode. Luckily an unknown Russian hacker somewhere has modified the BIOS to allow AHCI mode on both the internal and eSATA ports. I have always said, “Give the Russians two weeks and they will crack anything” and usually that has held true. Huge thank you to whomever has modified this BIOS given HPs complete failure to do so. I have enabled this with good results. The main one being no emails from Nexenta informing you that the syspool has moved to a degraded state when it actually hasn’t :) ... On Fri, Aug 05, 2011 at 09:05:07AM +0200, Eugen Leitl wrote: On Thu, Aug 04, 2011 at 11:58:47PM +0200, Eugen Leitl wrote: On Thu, Aug 04, 2011 at 02:43:30PM -0700, Larry Liu wrote: root@nexenta:/export/home/eugen# zpool add tank log /dev/dsk/c3d1p0 You should use c3d1s0 here. Th root@nexenta:/export/home/eugen# zpool add tank cache /dev/dsk/c3d1p1 Use c3d1s1. Thanks, that did the trick! root@nexenta:/export/home/eugen# zpool status tank pool: tank state: ONLINE scan: scrub repaired 0 in 0h0m with 0 errors on Fri Aug 5 03:04:57 2011 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 logs c3d1s0ONLINE 0 0 0 cache c3d1s1ONLINE 0 0 0 errors: No known data errors Hmm, it doesn't seem to last more than a couple hours under test load (mapped as a CIFS share receiving a bittorrent download with 10 k small files in it at about 10 MByte/s) before falling from the pool: root@nexenta:/export/home/eugen# zpool status tank pool: tank state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: none requested config: NAMESTATE READ WRITE CKSUM tankDEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 logs c3d1s0FAULTED 0 4 0 too many errors cache c3d1s1FAULTED 13 7.68K 0 too many errors errors: No known data errors dmesg sez Aug 5 05:53:26 nexenta EVENT-TIME: Fri Aug 5 05:53:26 CEST 2011 Aug 5 05:53:26 nexenta PLATFORM: ProLiant-MicroServer, CSN: CN7051P024, HOSTNAME: nexenta Aug 5 05:53:26 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Aug 5 05:53:26 nexenta EVENT-ID: 516e9c7c-9e29-c504-a422-db37838fa676 Aug 5 05:53:26 nexenta DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. Aug 5 05:53:26 nexenta AUTO-RESPONSE: No automated response will occur. Aug 5 05:53:26 nexenta IMPACT: Fault tolerance of the pool may be compromised. Aug 5 05:53:26 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad device. Aug 5 05:53:39 nexenta fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Aug 5 05:53:39 nexenta EVENT-TIME: Fri Aug 5 05:53:39 CEST 2011 Aug 5 05:53:39 nexenta PLATFORM: ProLiant-MicroServer, CSN: CN7051P024, HOSTNAME: nexenta Aug 5 05:53:39 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Aug 5 05:53:39 nexenta EVENT-ID: 3319749a-b6f7-c305-ec86-d94897dde85b Aug 5 05:53:39 nexenta DESC: The number of I/O errors associated with a ZFS device exceeded Aug 5 05:53:39 nexenta acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Aug 5 05:53:39 nexenta AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Aug 5 05:53:39 nexenta will be made to activate a hot spare if available. Aug 5 05:53:39 nexenta IMPACT: Fault tolerance of the pool may be compromised. Aug 5 05:53:39 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad device. -- Eugen* Leitl a href=http://leitl.org;leitl/a
[zfs-discuss] Disable ZIL - persistent
After a certain rev, I know you can set the sync property, and it takes effect immediately, and it's persistent across reboots. But that doesn't apply to Solaris 10. My question: Is there any way to make Disabled ZIL a normal mode of operations in solaris 10? Particularly: If I do this echo zil_disable/W0t1 | mdb -kw then I have to remount the filesystem. It's kind of difficult to do this automatically at boot time, and impossible (as far as I know) for rpool. The only solution I see is to write some startup script which applies it to filesystems other than rpool. Which feels kludgy. Is there a better way? Thanks... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disable ZIL - persistent
On 08/05/11 13:11, Edward Ned Harvey wrote: After a certain rev, I know you can set the sync property, and it takes effect immediately, and it's persistent across reboots. But that doesn't apply to Solaris 10. My question: Is there any way to make Disabled ZIL a normal mode of operations in solaris 10? Particularly: If I do this echo zil_disable/W0t1 | mdb -kw then I have to remount the filesystem. It's kind of difficult to do this automatically at boot time, and impossible (as far as I know) for rpool. The only solution I see is to write some startup script which applies it to filesystems other than rpool. Which feels kludgy. Is there a better way? echo set zfs:zil_disable = 1 /etc/system -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disable ZIL - persistent
On 05 August, 2011 - Darren J Moffat sent me these 0,9K bytes: On 08/05/11 13:11, Edward Ned Harvey wrote: After a certain rev, I know you can set the sync property, and it takes effect immediately, and it's persistent across reboots. But that doesn't apply to Solaris 10. My question: Is there any way to make Disabled ZIL a normal mode of operations in solaris 10? Particularly: If I do this echo zil_disable/W0t1 | mdb -kw then I have to remount the filesystem. It's kind of difficult to do this automatically at boot time, and impossible (as far as I know) for rpool. The only solution I see is to write some startup script which applies it to filesystems other than rpool. Which feels kludgy. Is there a better way? echo set zfs:zil_disable = 1 /etc/system Or use if you don't want to zap /etc/system.. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disable ZIL - persistent
On 5 Aug 11, at 08:14 , Darren J Moffat wrote: On 08/05/11 13:11, Edward Ned Harvey wrote: My question: Is there any way to make Disabled ZIL a normal mode of operations in solaris 10? Particularly: If I do this echo zil_disable/W0t1 | mdb -kw then I have to remount the filesystem. It's kind of difficult to do this automatically at boot time, and impossible (as far as I know) for rpool. The only solution I see is to write some startup script which applies it to filesystems other than rpool. Which feels kludgy. Is there a better way? echo set zfs:zil_disable = 1 /etc/system echo set zfs:zil_disable = 1 /etc/system Mike --- Michael Sullivan m...@axsh.us http://www.axsh.us/ Phone: +1-662-259- Mobile: +1-662-202-7716 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wrong rpool used after reinstall!
On 8/3/2011 5:47 PM, Ian Collins wrote: On 08/ 4/11 01:29 AM, Stuart James Whitefish wrote: I have Solaris on Sparc boxes available if it would help to do a net install or jumpstart. I have never done those and it looks complicated, although I think I may be able to get to the point in the u9 installer on my Intel box where it asks me whether I want to install from DVD etc. But I may be wrong, and anyway the single user shell in the u9 DVD also panics when I try to import tank so maybe that won't help. Put your old drive in a USB enclosure and connect it to another system in order to read back the data. I'm curious - would it work to boot from a live CD, go to shell, and deport/import/rename the old rpool, then boot normally? I have only 4 sata ports on this Intel box so I have to keep pulling cables to be able to boot from a DVD and then I won't have all my drives available. I cannot move these drives to any other box because they are consumer drives and my servers all have ultras. Most modern boards will be boot from a live USB stick. -- --- Brian Wilson, Solaris SE, UW-Madison DoIT Room 3114 CSS608-263-8047 bfwilson(a)doit.wisc.edu 'I try to save a life a day. Usually it's my own.' - John Crichton --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Exapnd ZFS storage.
Thanks Guys... :-) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wrong rpool used after reinstall!
Jim wrote: But I may be wrong, and anyway the single user shell in the u9 DVD also panics when I try to import tank so maybe that won't help. Ian wrote: Put your old drive in a USB enclosure and connect it to another system in order to read back the data. Given that update 9 can't import the pool is this really worth trying? I would have to buy the enclosures, if I had them already I would have tried it in desperation. Jim wrote: I have only 4 sata ports on this Intel box so I have to keep pulling cables to be able to boot from a DVD and then I won't have all my drives available. I cannot move these drives to any other box because they are consumer drives and my servers all have ultras. Ian wrote: Most modern boards will be boot from a live USB stick. True but I haven't found a way to get an ISO onto a USB that my system can boot from it. I was using DD to copy the iso to the usb drive. Is there some other way? This is really frustrating. I haven't had any problems with Linux filesystems but I heard ZFS was safer. It's really ironic that I lost access to so much data after moving it to ZFS. Isn't there any way to get them back on my newly installed U8 system? If I disconnect this pool the system starts fine. Otherwise my questions above in my summary post might be key to getting this working. Thanks, Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
I am opening a new thread since I found somebody else reported a similar failure in May and I didn't see a resolution hopefully this post will be easier to find for people with similar problems. Original thread was http://opensolaris.org/jive/thread.jspa?threadID=140861 System: snv_151a 64 bit on Intel. Error: panic[cpu0] assertion failed: zvol_get_stats(os, nv) == 0, file: ../../common/fs/zfs/zfs_ioctl.c, line: 1815 Failure first seen on Solaris 10, update 8 History: I recently received two 320G drives and realized from reading this list it would have been better if I would have done the install on the small drives but I didn't have them at the time. I added the two 320G drives and created tank mirror. I moved some data from other sources to the tank and then decided to go ahead and do a new install. In preparation for that I moved all the data I wanted to save onto the rpool mirror and then installed Solaris 10 update 8 again on the 320G drives. When my system rebooted after the installation, I saw for some reason it used my tank pool as root. I realize now since it was originally a root pool and had boot blocks this didn't help. Anyway I shut down, changed the boot order and then booted into my system. It paniced when trying to access the tank and instantly rebooted. I had to go through this several times until I caught a glimpse of one of the first messages: assertion failed: zvol_get_stats(os, nv) Here is what my system looks like when I boot into failsafe mode. # zpool import pool: rpool id: 16453600103421700325 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: rpool ONLINE mirror ONLINE c0t2d0s0 ONLINE c0t3d0s0 ONLINE pool: tank id: 12861119534757646169 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: tank ONLINE mirror ONLINE c0t0d0s0 ONLINE c0t1d0s0 ONLINE # zpool import tank cannot import 'tank': pool may be in use from other system use '-f' to import anyway Here is a photo of my screen (hah hah old fashioned screen shot) when Sol 11 starts now that I tried importing my pool it fails constantly. # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I installed Solaris 11 Express USB via Hiroshi-san's Windows tool. Unfortunately it also panics trying to import the pool although zpool import shows the pool online with no errors just like in the above doc. and here is an eerily identical photo capture made by somebody with a similar/identical error. http://prestonconnors.com/zvol_get_stats.jpg At first I thought it was a copy of my screenshot but I see his terminal is white and mine is black. Looks like the problem has been around since 2009 although my problem is with a newly created mirror pool that had plenty of space available (200G in use out of about 500G) and no snapshots were taken. Similar discussion with discouraging lack of follow up: http://opensolaris.org/jive/message.jspa?messageID=376366 Looks like the defect, it's closed and I see no resolution. https://defect.opensolaris.org/bz/show_bug.cgi?id=5682 I have about 200G of data on the tank pool, about 100G or so I don't have anywhere else. I created this pool specifically to make a safe place to store data that I had accumulated over several years and didn't have organized yet. I can't believe such a serious bug has been around for two years and hasn't been fixed. Can somebody please help me get this data back? Thank you. Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wrong rpool used after reinstall!
I'm opening a new thread since the original subject was not as helpful and I saw a similar problem mentioned in May of this year (2011) and others going back to 2009. New thread is found at http://opensolaris.org/jive/thread.jspa?threadID=140899 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disable ZIL - persistent
On Aug 5, 2011, at 6:14 AM, Darren J Moffat darr...@opensolaris.org wrote: On 08/05/11 13:11, Edward Ned Harvey wrote: After a certain rev, I know you can set the sync property, and it takes effect immediately, and it's persistent across reboots. But that doesn't apply to Solaris 10. My question: Is there any way to make Disabled ZIL a normal mode of operations in solaris 10? Particularly: If I do this echo zil_disable/W0t1 | mdb -kw then I have to remount the filesystem. It's kind of difficult to do this automatically at boot time, and impossible (as far as I know) for rpool. The only solution I see is to write some startup script which applies it to filesystems other than rpool. Which feels kludgy. Is there a better way? echo set zfs:zil_disable = 1 /etc/system This is a great way to cure /etc/system viruses :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish swhitef...@yahoo.com wrote: # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I encourage you to open a support case and ask for an escalation on CR 7056738. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import starves machine of memory
Another update: The configuration of the zpool is 45 x 1 TB drives in three vdev's, each of 15 drives. We should have a net capacity of between 30 and 36 TB (and that agrees with my memory of the pool). I ran zdb -e -d against the pool (not imported) and totaled the size of the datasets and came up with just about 11 TB. This also agrees with my memory (about 18 TB of data and about 1.5 compression ratio). If the failed snapshot / zfs recv is 3 TB (like I think it should be) or almost 8 TB (as Oracle is telling me based on some mdb -k examinations of the dataset delete thread), I should still have almost 10 TB free. I am making an assumption here, and that is that the size listed for the dataset with zdb -d includes all snapshots of that dataset (much like the SIZE field of a zfs list). If that is NOT the case, then I need to come up with a different way to estimate the fullness of this zpool. On Thu, Aug 4, 2011 at 1:25 PM, Paul Kraus p...@kraus-haus.org wrote: Updates to my problem: 1. The destroy operation appears to be restarting from the same point after the system hangs and has to be rebooted. Oracle gave me the following to track progress: echo '::pgrep zpool$ |::walk thread|::findstack -v' | mdb -k | grep dsl_dataset_destroy then take first arg of dsl_dataset_destroy and echo 'ARG::print dsl_dataset_t ds_phys-ds_used_bytes' | mdb -k I am logging these values every minute. Yesterday when I started tracking this I got a value of 0x75d97516b62, my last data point before the system hung was 0x4ee1098bdfd. My first first data point today after rebooting, restarting the logging scripts, and restarting the zpool import is 0x7a0b0634a1b. So it looks like I've made no real progress. 2. It looks like the root cause of the original system crash that left the incomplete zfs recv snapshot is that the a zfs recv filled the zpool (there are two parallel zfs recv's running, one for an old configuration (many datasets) and one for the new (one large dataset)). My replication script checks for free space before stating the replication, but we had a huge data load and replication of it running (3 TB), and when it started there was room for it, but other (much smaller) data loads and replication may have consumed it. This system has no other activity on it, it is just a repository for this replicated data. So ... it looks like I have: - a full zpool - an incomplete (corrupt ?) snapshot from a zfs recv ... and every time I try to import this zpool I hang the system due to lack of memory (the box has 32 GB of RAM). Any suggestions how to delete / destroy this incomplete snapshot without running the system out of RAM ? On Wed, Aug 3, 2011 at 9:56 AM, Paul Kraus p...@kraus-haus.org wrote: An additional data point, when i try to do a zdb -e -d and find the incomplete zfs recv snapshot I get an error as follows: # sudo zdb -e -d xxx-yy-01 | grep % Could not open xxx-yy-01/aaa-bb-01/aaa-bb-01-01/%1309906801, error 16 # Anyone know what error 16 means from zdb and how this might impact importing this zpool ? On Wed, Aug 3, 2011 at 9:19 AM, Paul Kraus p...@kraus-haus.org wrote: I am having a very odd problem, and so far the folks at Oracle Support have not provided a working solution, so I am asking the crowd here while still pursuing it via Oracle Support. The system is a T2000 running 10U9 with CPU-2010-01and two J4400 loaded with 1 TB SATA drives. There is one zpool on the J4400 (3 x 15 disk vdev + 3 hot spare). This system is the target for zfs send / recv replication from our production server.The OS is UFS on local disk. While I was on vacation this T2000 hung with out of resource errors. Other staff tried rebooting, which hung the box. Then they rebooted off of an old BE (10U9 without CPU-2010-01). Oracle Support had them apply a couple patches and an IDR to address zfs stability and reliability problems as well as set the following in /etc/system set zfs:zfs_arc_max = 0x7 (which is 28 GB) set zfs:arc_meta_limit = 0x7 (which is 28 GB) The system has 32 GB RAM and 32 (virtual) CPUs. They then tried importing the zpool and the system hung (after many hours) with the same out of resource error. At this point they left the problem for me :-( I removed the zfs.cache from the 10U9 + CPU 2010-10 BE and booted from that. I then applied the IDR (IDR146118-12 )and the zfs patch it depended on (145788-03). I did not include the zfs arc and zfs arc meta limits as I did not think they relevant. A zpool import shows the pool is OK and a sampling with zdb -l of the drives shows good labels. I started importing the zpool and after many hours it hung the system with out of resource errors. I had a number of tools running to see what was going on. The only thing this system is doing is importing the zpool. ARC had climbed to about 8 GB and then declined to 3 GB by the time the system hung. This tells me that
Re: [zfs-discuss] Wrong rpool used after reinstall!
On Thu, Aug 04, 2011 at 03:52:39AM -0700, Stuart James Whitefish wrote: Jim wrote: But I may be wrong, and anyway the single user shell in the u9 DVD also panics when I try to import tank so maybe that won't help. Ian wrote: Put your old drive in a USB enclosure and connect it to another system in order to read back the data. Given that update 9 can't import the pool is this really worth trying? I would have to buy the enclosures, if I had them already I would have tried it in desperation. Jim wrote: I have only 4 sata ports on this Intel box so I have to keep pulling cables to be able to boot from a DVD and then I won't have all my drives available. I cannot move these drives to any other box because they are consumer drives and my servers all have ultras. Ian wrote: Most modern boards will be boot from a live USB stick. True but I haven't found a way to get an ISO onto a USB that my system can boot from it. I was using DD to copy the iso to the usb drive. Is there some other way? Maybe give http://unetbootin.sourceforge.net/ a try. Bill This is really frustrating. I haven't had any problems with Linux filesystems but I heard ZFS was safer. It's really ironic that I lost access to so much data after moving it to ZFS. Isn't there any way to get them back on my newly installed U8 system? If I disconnect this pool the system starts fine. Otherwise my questions above in my summary post might be key to getting this working. Thanks, Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wrong rpool used after reinstall!
On Fri, 5 Aug 2011, Bill wrote: True but I haven't found a way to get an ISO onto a USB that my system can boot from it. I was using DD to copy the iso to the usb drive. Is there some other way? Maybe give http://unetbootin.sourceforge.net/ a try. This package seems to list support for most x86 OSs EXCEPT for *Solaris. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wrong rpool used after reinstall!
On 08/ 4/11 10:52 PM, Stuart James Whitefish wrote: Ian wrote: Put your old drive in a USB enclosure and connect it to another system in order to read back the data. Given that update 9 can't import the pool is this really worth trying? I would use a newer (express maybe) system. Most modern boards will be boot from a live USB stick. True but I haven't found a way to get an ISO onto a USB that my system can boot from it. I was using DD to copy the iso to the usb drive. Is there some other way? Recent OpenSolaris based builds have a handy utility usbcopy. This is really frustrating. I haven't had any problems with Linux filesystems but I heard ZFS was safer. It's really ironic that I lost access to so much data after moving it to ZFS. Isn't there any way to get them back on my newly installed U8 system? If I disconnect this pool the system starts fine. Otherwise my questions above in my summary post might be key to getting this working. If you have support, badger them. Otherwise use a newer system. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
Is mirrors really a realistic alternative? I mean, if I have to resilver a raid with 3TB discs, it can take days I suspect. With 4TB disks it can take a week, maybe. So, if I use mirror and one disk break, then I only have single redundancy while the mirror repairs. Reparation will take long time and will stress the disks, which means the other disk might malfunction. Therefore, I think raidz2 or raidz3 that allows 2 or 3 disks to break while you resilver. Hence, mirror is not a realistic alternative when using large disks. True/false? What do you guys say? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Problem booting after zfs upgrade
After upgrading to zpool version 29/zfs version 5 on a S10 test system via the kernel patch 144501-19 it will now boot only as far as the to the grub menu. What is a good Solaris rescue image that I can boot that will allow me to import this rpool to look at it (given the newer version)? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
On 08/ 6/11 10:42 AM, Orvar Korvar wrote: Is mirrors really a realistic alternative? To what? Some context would be helpful. I mean, if I have to resilver a raid with 3TB discs, it can take days I suspect. With 4TB disks it can take a week, maybe. So, if I use mirror and one disk break, then I only have single redundancy while the mirror repairs. Reparation will take long time and will stress the disks, which means the other disk might malfunction. Therefore, I think raidz2 or raidz3 that allows 2 or 3 disks to break while you resilver. Hence, mirror is not a realistic alternative when using large disks. True/false? What do you guys say? I don't have any exact like for like comparison data, but from what I've seen a mirror resilvers a lot faster than a drive in a raidz(2) vdev. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem booting after zfs upgrade
On 08/ 6/11 11:48 AM, stuart anderson wrote: After upgrading to zpool version 29/zfs version 5 on a S10 test system via the kernel patch 144501-19 it will now boot only as far as the to the grub menu. What is a good Solaris rescue image that I can boot that will allow me to import this rpool to look at it (given the newer version)? A Solaris 11 express live CD. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
Generally, mirrors resilver MUCH faster than RAIDZ, and you only lose redundancy on that stripe, so combined, you're much closer to RAIDZ2 odds than you might think, especially with hot spare(s), which I'd reccommend. When you're talking about IOPS, each stripe can support 1 simultanious user. Writing: Each RAIDZ group = 1 stripe. Each mirror group = 1 stripe. So, 216 drives can be 24 stripes or 108 stripes. Reading: Each RAIDZ group = 1 stripe. Each mirror group = 1 stripe per drive. So, 216 drives can be 24 stripes or 216 stripes. Actually, reads from mirrors are even more efficient than reads from stripes, because the software can optimally load balance across mirrors. So, back to the original poster's question, 9 stripes might be enough to support 5 clients, but 216 stripes could support many more. Actually, this is an area where RAID5/6 has an advantage over RAIDZ, if I understand correctly, because for RAID5/6 on read-only workloads, each drive acts like a stripe. For workloads with writing, though, RAIDZ is significantly faster than RAID5/6, but mirrors/RAID10 give the best performance for all workloads. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disable ZIL - persistent
From: Darren J Moffat [mailto:darr...@opensolaris.org] Sent: Friday, August 05, 2011 10:14 AM echo set zfs:zil_disable = 1 /etc/system This is a great way to cure /etc/system viruses :-) LOL! :-) Thank you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem booting after zfs upgrade
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Ian Collins On 08/ 6/11 11:48 AM, stuart anderson wrote: After upgrading to zpool version 29/zfs version 5 on a S10 test system via the kernel patch 144501-19 it will now boot only as far as the to the grub menu. What is a good Solaris rescue image that I can boot that will allow me to import this rpool to look at it (given the newer version)? A Solaris 11 express live CD. FYI. Before a certain rev, if you zpool upgrade, you have silently invalidated your grub boot blocks, and you simply need to know based on past experience that you need to installgrub. After a certain rev, the system will notify you with a helpful informative message. You need to installgrub or something like that. And after yet a later rev... It does the installgrub for you automatically. Or maybe I'm just talking about starting a new mirror of rpool. Maybe the same thing is not true in regards to zpool upgrade. I don't know for sure. In any event... You need to do something like this: installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t0d0s0 (substitute whatever device slice you have used for rpool) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss