Re: [zfs-discuss] Replacing root pool disk
hi by default the disk partition s2 cover the whole disk this is fine for ufs for LONG time. Now zfs does not like this overlap so you just need to run format then delete s2 or use s2 and delete all other partitions (by default when you run format/fdisk it create s2 whole disk and s7 for boot, so there is also overlap between s2 and s7:-() ZFS root need to fix this problem that require partition and not the whole disk (without s?) my 2c regards On 4/12/2012 1:35 PM, Peter Wood wrote: Hi, I was following the instructions in ZFS Troubleshooting Guide on how to replace a disk in the root pool on x86 system. I'm using OpenIndiana, ZFS pool v.28 with mirrored system rpool. The replacement disk is brand new. root:~# zpool status pool: rpool state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: resilvered 17.6M in 0h0m with 0 errors on Wed Apr 11 17:45:16 2012 config: NAME STATE READ WRITE CKSUM rpoolDEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c2t5000CCA369C55DB8d0s0 OFFLINE 0 126 0 c2t5000CCA369D5231Cd0s0 ONLINE 0 0 0 errors: No known data errors root:~# I'm not very familiar with Solaris partitions and slices so somewhere in the format/partition commands I must to have made a mistake because when I try to replace the disk I'm getting the following error: root:~# zpool replace rpool c2t5000CCA369C55DB8d0s0 c2t5000CCA369C89636d0s0 invalid vdev specification use '-f' to override the following errors: /dev/dsk/c2t5000CCA369C89636d0s0 overlaps with /dev/dsk/c2t5000CCA369C89636d0s2 root:~# I used -f and it worked but I was wondering is there a way to completely reset the new disk? Remove all partitions and start from scratch. Thank you Peter ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] webserver zfs root lock contention under heavy load
hi you did not answer the question, what is the RAM of the server? how many socket and core etc what is the block size of zfs? what is the cache ram of your san array? what is the block size/strip size of your raid in san array? raid 5 or what? what is your test program and how (from what kind client) regards On 3/26/2012 11:13 PM, Aubrey Li wrote: On Tue, Mar 27, 2012 at 1:15 AM, Jim Klimovj...@cos.ru wrote: Well, as a further attempt down this road, is it possible for you to rule out ZFS from swapping - i.e. if RAM amounts permit, disable the swap at all (swap -d /dev/zvol/dsk/rpool/swap) or relocate it to dedicated slices of same or better yet separate disks? Thanks Jim for your suggestion! If you do have lots of swapping activity (that can be seen in vmstat 1 as si/so columns) going on in a zvol, you're likely to get much fragmentation in the pool, and searching for contiguous stretches of space can become tricky (and time-consuming), or larger writes can get broken down into many smaller random writes and/or gang blocks, which is also slower. At least such waiting on disks can explain the overall large kernel times. I took swapping activity into account, even when the CPU% is 100%, si (swap-ins) and so (swap-outs) are always ZEROs. You can also see the disk wait times ratio in iostat -xzn 1 column %w and disk busy times ratio in %b (second and third from the right). I dont't remember you posting that. If these are accounting in tens, or even close or equal to 100%, then your disks are the actual bottleneck. Speeding up that subsystem, including addition of cache (ARC RAM, L2ARC SSD, maybe ZIL SSD/DDRDrive) and combatting fragmentation by moving swap and other scratch spaces to dedicated pools or raw slices might help. My storage system is not quite busy, and there are only read operations. = # iostat -xnz 3 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 112.40.0 1691.50.0 0.0 0.50.04.8 0 41 c11t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 118.70.0 1867.00.0 0.0 0.50.04.5 0 42 c11t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 127.70.0 2121.60.0 0.0 0.60.04.7 0 44 c11t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 141.30.0 2158.50.0 0.0 0.70.04.6 0 48 c11t0d0 == Thanks, -Aubrey ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Advice for migrating ZFS configuration
just note that you can has different zpool name but with the same old mount point for export purpose -LT On 3/8/2012 8:40 AM, Paul Kraus wrote: Lots of suggestions (not included here), but ... With the exception of Cindy's suggestion of using 4 disks and mirroring (zpool attach two new disks to existing vdevs), I would absolutely NOT do anything unless I had a known good backup of the data! I have seen too many cases described here on this list of people trying complicated procedures with ZFS and making one small mistake and loosing their data, or spending weeks or months trying to recover it. Regarding IMPORT / EXPORT, these functions are have two real purposes in my mind: 1. you want to move a zpool from one host to another. You EXPORT from the first host, physically move the disks, then IMPORT on the new host. 2. You want (or need) to physically change the connectivity between the disks and the host, and implicit in that is that the device paths will change. EXPORT, change connectivity, IMPORT. Once again I have seen many cases described on this list of folks who moved disks around, which ZFS is _supposed_ to handle, but then had a problem. I use ZFS first for reliability and second for performance. With that in mind, one of my primary rules for ZFS is to NOT move disks around without first exporting the zpool. I have done some pretty rude things regarding devices underlying vdev disappearing and then much later reappearing (mostly in test, but occasionally in production), and I have yet to lose any data, BUT none of the devices changed path in the process. On Wed, Mar 7, 2012 at 4:38 PM, Bob Doolittlebob.doolit...@oracle.com wrote: Hi, I had a single-disk zpool (export) and was given two new disks for expanded storage. All three disks are identically sized, no slices/partitions. My goal is to create a raidz1 configuration of the three disks, containing the data in the original zpool. However, I got off on the wrong foot by doing a zpool add of the first disk. Apparently this has simply increased my storage without creating a raidz config. Unfortunately, there appears to be no simple way to just remove that disk now and do a proper raidz create of the other two. Nor am I clear on how import/export works and whether that's a good way to copy content from one zpool to another on a single host. Can somebody guide me? What's the easiest way out of this mess, so that I can move from what is now a simple two-disk zpool (less than 50% full) to a three-disk raidz configuration, starting with one unused disk? In the end I want the three-disk raidz to have the same name (and mount point) as the original zpool. There must be an easy way to do this. -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Advice for migrating ZFS configuration
IMHO, there is no easy way out for you 1)tape backup and restore 2)find a larger USB SATA disk, copy the data over then restore later after raidz1 setup -LT On 3/7/2012 4:38 PM, Bob Doolittle wrote: Hi, I had a single-disk zpool (export) and was given two new disks for expanded storage. All three disks are identically sized, no slices/partitions. My goal is to create a raidz1 configuration of the three disks, containing the data in the original zpool. However, I got off on the wrong foot by doing a zpool add of the first disk. Apparently this has simply increased my storage without creating a raidz config. Unfortunately, there appears to be no simple way to just remove that disk now and do a proper raidz create of the other two. Nor am I clear on how import/export works and whether that's a good way to copy content from one zpool to another on a single host. Can somebody guide me? What's the easiest way out of this mess, so that I can move from what is now a simple two-disk zpool (less than 50% full) to a three-disk raidz configuration, starting with one unused disk? In the end I want the three-disk raidz to have the same name (and mount point) as the original zpool. There must be an easy way to do this. Thanks for any assistance. -Bob P.S. I would appreciate being kept on the CC list for this thread to avoid digest mailing delays. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] need hint on pool setup
what is your main application for ZFS? e.g. just NFS or iSCSI for home dir or VM? or Window client? Is performance important? or space is more important? what is the memory of your server? do you want to use ZIL or L2ARC? what is your backup or DR plan? You need to answer all these question first my 2c On 1/31/2012 3:44 PM, Thomas Nau wrote: Dear all We have two JBODs with 20 or 21 drives available per JBOD hooked up to a server. We are considering the following setups: RAIDZ2 made of 4 drives RAIDZ2 made of 6 drives The first option wastes more disk space but can survive a JBOD failure whereas the second is more space effective but the system goes down when a JBOD goes down. Each of the JBOD comes with dual controllers, redundant fans and power supplies so do I need to be paranoid and use option #1? Of course it also gives us more IOPs but high end logging devices should take care of that Thanks for any hint Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs defragmentation via resilvering?
it seems that s11 shadow migration can help:-) On 1/7/2012 9:50 AM, Jim Klimov wrote: Hello all, I understand that relatively high fragmentation is inherent to ZFS due to its COW and possible intermixing of metadata and data blocks (of which metadata path blocks are likely to expire and get freed relatively quickly). I believe it was sometimes implied on this list that such fragmentation for static data can be currently combatted only by zfs send-ing existing pools data to other pools at some reserved hardware, and then clearing the original pools and sending the data back. This is time-consuming, disruptive and requires lots of extra storage idling for this task (or at best - for backup purposes). I wonder how resilvering works, namely - does it write blocks as they were or in an optimized (defragmented) fashion, in two usecases: 1) Resilvering from a healthy array (vdev) onto a spare drive in order to replace one of the healthy drives in the vdev; 2) Resilvering a degraded array from existing drives onto a new drive in order to repair the array and make it redundant again. Also, are these two modes different at all? I.e. if I were to ask ZFS to replace a working drive with a spare in the case (1), can I do it at all, and would its data simply be copied over, or reconstructed from other drives, or some mix of these two operations? Finally, what would the gurus say - does fragmentation pose a heavy problem on nearly-filled-up pools made of spinning HDDs (I believe so, at least judging from those performance degradation problems writing to 80+%-filled pools), and can fragmentation be effectively combatted on ZFS at all (with or without BP rewrite)? For example, can(does?) metadata live separately from data in some dedicated disk areas, while data blocks are written as contiguously as they can? Many Windows defrag programs group files into several zones on the disk based on their last-modify times, so that old WORM files remain defragmented for a long time. There are thus some empty areas reserved for new writes as well as for moving newly discovered WORM files to the WORM zones (free space permitting)... I wonder if this is viable with ZFS (COW and snapshots involved) when BP-rewrites are implemented? Perhaps such zoned defragmentation can be done based on block creation date (TXG number) and the knowledge that some blocks in certain order comprise at least one single file (maybe more due to clones and dedup) ;) What do you think? Thanks, //Jim Klimov ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thinking about spliting a zpool in system and data
may be one can do the following (assume c0t0d0 and c0t1d0) 1)split rpool mirror: zpool split rpool newpool c0t1d0s0 1b)zpool destroy newpool 2)partition 2nd hdd c0t1d0s0 into two slice (s0 and s1) 3)zpool create rpool2 c0t1d0s1 4)use lucreate -c c0t0d0s0 -n new-zfsbe -p c0t1d0s0 5)lustatus c0t0d0s0 new-zfsbe 6)luactivate new-zfsbe 7)init 6 now you have two BE old and new you can create dpool on slice1 add L2ARC and zil and repartition c0t0d0 if you want you can create rpool on c0t0d0s0 and new BE so everything will be name rpool for root pool SWAP and DUMP can be on different rpool good luck On 1/6/2012 12:32 AM, Jesus Cea wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sorry if this list is inappropriate. Pointers welcomed. Using Solaris 10 Update 10, x86-64. I have been a ZFS heavy user since available, and I love the system. My servers are usually small (two disks) and usually hosted in a datacenter, so I usually create a ZPOOL used both for system and data. That is, the entire system contains an unique two-disk zpool. This have worked nice so far. But my new servers have SSD too. Using them for L2ARC is easy enough, but I can not use them as ZIL because no separate ZIL device can be used in root zpools. Ugh, that hurts!. So I am thinking about splitting my full two-disk zpool in two zpools, one for system and other for data. Both using both disks for mirroring. So I would have two slices per disk. I have the system in production in a datacenter I can not access, but I have remote KVM access. Servers are in production, I can't reinstall but I could be allowed to have small (minutes) downtimes for a while. My plan is this: 1. Do a scrub to be sure the data is OK in both disks. 2. Break the mirror. The A disk will keep working, B disk is idle. 3. Partition B disk with two slices instead of current full disk slice. 4. Create a system zpool in B. 5. Snapshot zpool/ROOT in A and zfs send it to system in B. Repeat several times until we have a recent enough copy. This stream will contain the OS and the zones root datasets. I have zones. 6. Change GRUB to boot from system instead of zpool. Cross fingers and reboot. Do I have to touch the bootfs property? Now ideally I would be able to have system as the zpool root. The zones would be mounted from the old datasets. 7. If everything is OK, I would zfs send the data from the old zpool to the new one. After doing a few times to get a recent copy, I would stop the zones and do a final copy, to be sure I have all data, no changes in progress. 8. I would change the zone manifest to mount the data in the new zpool. 9. I would restart the zones and be sure everything seems ok. 10. I would restart the computer to be sure everything works. So fair, it this doesn't work, I could go back to the old situation simply changing the GRUB boot to the old zpool. 11. If everything works, I would destroy the original zpool in A, partition the disk and recreate the mirroring, with B as the source. 12. Reboot to be sure everything is OK. So, my questions: a) Is this workflow reasonable and would work?. Is the procedure documented anywhere?. Suggestions?. Pitfalls? b) *MUST* SWAP and DUMP ZVOLs reside in the root zpool or can they live in a nonsystem zpool? (always plugged and available). I would like to have a quite small(let say 30GB, I use Live Upgrade and quite a fez zones) system zpool, but my swap is huge (32 GB and yes, I use it) and I would rather prefer to have SWAP and DUMP in the data zpool, if possible supported. c) Currently Solaris decides to activate write caching in the SATA disks, nice. What would happen if I still use the complete disks BUT with two slices instead of one?. Would it still have write cache enabled?. And yes, I have checked that the cache flush works as expected, because I can only do around one hundred write+sync per second. Advices?. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTwaHW5lgi5GaxT1NAQLe/AP9EIK0tckVBhqzrTHWbNzT2TPUGYc7ZYjS pZYX1EXkJNxVOmmXrWApmoVFGtYbwWeaSQODqE9XY5rUZurEbYrXOmejF2olvBPL zyGFMnZTcmWLTrlwH5vaXeEJOSBZBqzwMWPR/uv2Z/a9JWO2nbidcV1OAzVdT2zU kfboJpbxONQ= =6i+A -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC
Re: [zfs-discuss] Thinking about spliting a zpool in system and data
correction On 1/6/2012 3:34 PM, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. wrote: may be one can do the following (assume c0t0d0 and c0t1d0) 1)split rpool mirror: zpool split rpool newpool c0t1d0s0 1b)zpool destroy newpool 2)partition 2nd hdd c0t1d0s0 into two slice (s0 and s1) 3)zpool create rpool2 c0t1d0s1 ===should be c0t1d0s0 4)use lucreate -c c0t0d0s0 -n new-zfsbe -p c0t1d0s0 ==rpool2 5)lustatus c0t0d0s0 new-zfsbe 6)luactivate new-zfsbe 7)init 6 now you have two BE old and new you can create dpool on slice1 add L2ARC and zil and repartition c0t0d0 if you want you can create rpool on c0t0d0s0 and new BE so everything will be name rpool for root pool SWAP and DUMP can be on different rpool good luck On 1/6/2012 12:32 AM, Jesus Cea wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sorry if this list is inappropriate. Pointers welcomed. Using Solaris 10 Update 10, x86-64. I have been a ZFS heavy user since available, and I love the system. My servers are usually small (two disks) and usually hosted in a datacenter, so I usually create a ZPOOL used both for system and data. That is, the entire system contains an unique two-disk zpool. This have worked nice so far. But my new servers have SSD too. Using them for L2ARC is easy enough, but I can not use them as ZIL because no separate ZIL device can be used in root zpools. Ugh, that hurts!. So I am thinking about splitting my full two-disk zpool in two zpools, one for system and other for data. Both using both disks for mirroring. So I would have two slices per disk. I have the system in production in a datacenter I can not access, but I have remote KVM access. Servers are in production, I can't reinstall but I could be allowed to have small (minutes) downtimes for a while. My plan is this: 1. Do a scrub to be sure the data is OK in both disks. 2. Break the mirror. The A disk will keep working, B disk is idle. 3. Partition B disk with two slices instead of current full disk slice. 4. Create a system zpool in B. 5. Snapshot zpool/ROOT in A and zfs send it to system in B. Repeat several times until we have a recent enough copy. This stream will contain the OS and the zones root datasets. I have zones. 6. Change GRUB to boot from system instead of zpool. Cross fingers and reboot. Do I have to touch the bootfs property? Now ideally I would be able to have system as the zpool root. The zones would be mounted from the old datasets. 7. If everything is OK, I would zfs send the data from the old zpool to the new one. After doing a few times to get a recent copy, I would stop the zones and do a final copy, to be sure I have all data, no changes in progress. 8. I would change the zone manifest to mount the data in the new zpool. 9. I would restart the zones and be sure everything seems ok. 10. I would restart the computer to be sure everything works. So fair, it this doesn't work, I could go back to the old situation simply changing the GRUB boot to the old zpool. 11. If everything works, I would destroy the original zpool in A, partition the disk and recreate the mirroring, with B as the source. 12. Reboot to be sure everything is OK. So, my questions: a) Is this workflow reasonable and would work?. Is the procedure documented anywhere?. Suggestions?. Pitfalls? b) *MUST* SWAP and DUMP ZVOLs reside in the root zpool or can they live in a nonsystem zpool? (always plugged and available). I would like to have a quite small(let say 30GB, I use Live Upgrade and quite a fez zones) system zpool, but my swap is huge (32 GB and yes, I use it) and I would rather prefer to have SWAP and DUMP in the data zpool, if possible supported. c) Currently Solaris decides to activate write caching in the SATA disks, nice. What would happen if I still use the complete disks BUT with two slices instead of one?. Would it still have write cache enabled?. And yes, I have checked that the cache flush works as expected, because I can only do around one hundred write+sync per second. Advices?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTwaHW5lgi5GaxT1NAQLe/AP9EIK0tckVBhqzrTHWbNzT2TPUGYc7ZYjS pZYX1EXkJNxVOmmXrWApmoVFGtYbwWeaSQODqE9XY5rUZurEbYrXOmejF2olvBPL zyGFMnZTcmWLTrlwH5vaXeEJOSBZBqzwMWPR/uv2Z/a9JWO2nbidcV1OAzVdT2zU kfboJpbxONQ= =6i+A -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http
Re: [zfs-discuss] SATA hardware advice
AFAIK, most ZFS based storage appliance are move to SAS with 7200 rpm or 15k rpm most SSD are SATA and are connecting to on bd SATA with IO chips On 12/19/2011 9:59 AM, tono wrote: Thanks for the sugestions, especially all the HP info and build pictures. Two things crossed my mind on the hardware front. The first is regarding the SSDs you have pictured, mounted in sleds. Any Proliant that I've read about connects the hotswap drives via a SAS backplane. So how did you avoid that (physically) to make the direct SATA connections? The second is regarding a conversation I had with HP pre-sales. A rep actually told me, in no uncertain terms, that using non-HP HBAs, RAM, or drives would completely void my warranty. I assume this is BS but I wonder if anyone has ever gotten resistance due to 3rd party hardware. In the States, at least, there is the Magnuson–Moss act. I'm just not sure if it applies to servers. Back to SATA though. I can appreciate fully about not wanting to take unnecessary risks, but there are a few things that don't sit well with me. A little background: this is to be a backup server for a small/medium business. The data, of course, needs to be safe, but we don't need extreme HA. I'm aware of two specific issues with SATA drives: the TLER/CCTL setting, and the issue with SAS expanders. I have to wonder if these account for most of the bad rap that SATA drives get. Expanders are built into nearly all of the JBODs and storage servers I've found (including the one in the serverfault post), so they must be in common use. So I'll ask again: are there any issues when connecting SATA drives directly to a HBA? People are, after all, talking left and right about using SATA SSDs... as long as they are connected directly to the MB controller. We might just do SAS at this point for peace of mind. It just bugs me that you can't use inexpensive disks in a R.A.I.D. I would think that RAIDZ and AHCI could handle just about any failure mode by now. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!
what are the output of zpool status pool1 and pool2 it seems that you have mix configuration of pool3 with disk and mirror On 12/18/2011 9:53 AM, Jan-Aage Frydenbø-Bruvoll wrote: Dear List, I have a storage server running OpenIndiana with a number of storage pools on it. All the pools' disks come off the same controller, and all pools are backed by SSD-based l2arc and ZIL. Performance is excellent on all pools but one, and I am struggling greatly to figure out what is wrong. A very basic test shows the following - pretty much typical performance at the moment: root@stor:/# for a in pool1 pool2 pool3; do dd if=/dev/zero of=$a/file bs=1M count=10; done 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.00772965 s, 1.4 GB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.00996472 s, 1.1 GB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 71.8995 s, 146 kB/s The zpool status of the affected pool is: root@stor:/# zpool status pool3 pool: pool3 state: ONLINE scan: resilvered 222G in 24h2m with 0 errors on Wed Dec 14 15:20:11 2011 config: NAME STATE READ WRITE CKSUM pool3 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 mirror-12 ONLINE 0 0 0 c1t26d0 ONLINE 0 0 0 c1t27d0 ONLINE 0 0 0 mirror-13 ONLINE 0 0 0 c1t28d0 ONLINE 0 0 0 c1t29d0 ONLINE 0 0 0 mirror-14 ONLINE 0 0 0 c1t34d0 ONLINE 0 0 0 c1t35d0 ONLINE 0 0 0 logs mirror-11 ONLINE 0 0 0 c2t2d0p8 ONLINE 0 0 0 c2t3d0p8 ONLINE 0 0 0 cache c2t2d0p12 ONLINE 0 0 0 c2t3d0p12 ONLINE 0 0 0 errors: No known data errors Ditto for the disk controller - MegaCli reports zero errors, be that on the controller itself, on this pool's disks or on any of the other attached disks. I am pretty sure I am dealing with a disk-based problem here, i.e. a flaky disk that is just slow without exhibiting any actual data errors, holding the rest of the pool back, but I am at a miss as how to pinpoint what is going on. Would anybody on the list be able to give me any pointers as how to dig up more detailed information about the pool's/hardware's performance? Thank you in advance for your kind assistance. Best regards Jan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server
please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and ZIL(SSD) may be try the ZFS simulator SW regards On 12/12/2011 2:28 PM, Albert Chin wrote: We're preparing to purchase an X4170M2 as an upgrade for our existing X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some more expensive than others. Our current system has a dual-core 1.8Ghz Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649 2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than enough. Based on what we're using the system for, it should be more I/O bound than CPU bound. We are doing compression in ZFS but that shouldn't be too CPU intensive. Seems we should be caring more about more cores than high Ghz. Recommendations? -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server
4c@2.4ghz On 12/12/2011 2:44 PM, Albert Chin wrote: On Mon, Dec 12, 2011 at 02:40:52PM -0500, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. wrote: please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and ZIL(SSD) may be try the ZFS simulator SW Good point. Thanks. regards On 12/12/2011 2:28 PM, Albert Chin wrote: We're preparing to purchase an X4170M2 as an upgrade for our existing X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some more expensive than others. Our current system has a dual-core 1.8Ghz Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649 2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than enough. Based on what we're using the system for, it should be more I/O bound than CPU bound. We are doing compression in ZFS but that shouldn't be too CPU intensive. Seems we should be caring more about more cores than high Ghz. Recommendations? -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ begin:vcard fn:Hung-Sheng Tsao n:Tsao;Hung-Sheng email;internet:laot...@gmail.com tel;cell:9734950840 x-mozilla-html:TRUE version:2.1 end:vcard -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server
On 12/12/2011 3:02 PM, Gary Driggs wrote: On Dec 12, 2011, at 11:42 AM, \Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.\ wrote: please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and ZIL(SSD) Do those appliances also use the F20 PCIe flash cards? no, these controller need the slots for SAS HBA to make HA-cluster configuration orFC0E, FC -HBA or 10ge HBA 7120 only support logzilla 7320 (x4170M2 head support readzilla (L2ARC) and logzilla (ZIL) 18GB 7420 (x4470M2 head) support readzilla(500GB or 1TB) and logzilla I know the Exadata storage cells use them but they aren't utilizing ZFS in the Linux version of the X2-2. Has that changed with the Solaris x86 versions of the appliance? Also, does OCZ or someone make an equivalent to the F20 now? -Gary ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS not starting
FYI http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-size-zfs-dedup-1354231.html never to late:-( On 12/1/2011 5:19 PM, Freddie Cash wrote: The system has 6GB of RAM and a 10GB swap partition. I added a 30GB swap file but this hasn't helped. ZFS doesn't use swap for the ARC (it's wired aka unswappable memory). And ZFS uses the ARC for dedupe support. You will need to find a lot of extra RAM to stuff into that machine in order for it to boot correctly, load the dedeupe tables into ARC, process the intent log, and then import the pool. And, you'll need that extra RAM in order to destroy the ZFS filesystem that has dedupe enabled. Basically, your DDT (dedupe table) is running you out of ARC space and livelocking (or is it deadlocking, never can keep those terms straight) the box. You can remove the RAM once you have things working again. Just don't re-enable dedupe until you have at least 16 GB of RAM in the box that can be dedicated to ZFS. And be sure to add a cache device to the pool. I just went through something similar with an 8 GB ZFS box (RAM is on order, but purchasing dept ordered from wrong supplier so we're stuck waiting for it to arrive) where I tried to destroy dedupe'd filesystem. Exact same results as you. Stole RAM out of a different server temporarily to get things working on this box again. # sysctl hw.physmem hw.physmem: 6363394048 tel:6363394048 # sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 5045088256 tel:5045088256 (I lowered arc_max to 1GB but hasn't helped) DO NOT LOWER THE ARC WHEN DEDUPE ENABLED!! -- Freddie Cash fjwc...@gmail.com mailto:fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS forensics
did you see this link http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script may be out of date already regards On 11/23/2011 11:14 AM, Gary Driggs wrote: Is zdb still the only way to dive in to the file system? I've seen the extensive work by Max Bruning on this but wonder if there are any tools that make this easier...? -Gary ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle releases Solaris 11 for Sparc and x86 servers
AFAIK, there is no change in open source policy for Oracle Solaris On 11/9/2011 10:34 PM, Fred Liu wrote: ... so when will zfs-related improvement make it to solaris- derivatives :D ? I am also very curious about Oracle's policy about source code. ;-) Fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sd_max_throttle
for ZFS appliance NFS or SMB(CIFS) as File server sd_max_throttle donot play for FC or iSCSI it may play regards On 11/3/2011 5:29 PM, Gary wrote: Hi folks, I'm reading through some I/O performance tuning documents and am finding some older references to sd_max_throttle kernel/project settings. Have there been any recent books or documentation written that talks about this more in depth? It seems to be more appropriate for FC or DAS but I'm wondering if anyone has had to touch this or other settings with ZFS appliances they've built...? -Gary ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.wordpress.com/ http://laotsao.blogspot.com/ attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there any implementation of VSS for a ZFS iSCSI snapshot on Solaris?
http://download.oracle.com/docs/cd/E22471_01/html/820-4167/application_integration__microsoft.html#application_integration__microsoft__sun_storage_7000_provider_for_microsoft_vs On 9/15/2011 9:19 AM, S Joshi wrote: By iirc do you mean 'if i remember correctly' or is there a company called iirc? Which ZSS appliance are you referring to? Thanks CC: zfs-discuss@opensolaris.org From: laot...@gmail.com Subject: Re: [zfs-discuss] Is there any implementation of VSS for a ZFS iSCSI snapshot on Solaris? Date: Wed, 14 Sep 2011 18:01:37 -0400 To: bit05...@hotmail.com iirc zfs appliance haa vss support Sent from my iPad Hung-Sheng Tsao ( LaoTsao) Ph.D On Sep 14, 2011, at 17:02, S Joshi bit05...@hotmail.com mailto:bit05...@hotmail.com wrote: I am using a Solaris + ZFS environment to export a iSCSI block layer device and use the snapshot facility to take a snapshot of the ZFS volume. Is there an existing Volume Shadow Copy (VSS) implementation on Windows for this environment? Thanks S Joshi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
may be try the following 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris then choose single user mode(6)) 2)when ask to mount rpool just say no 3)mkdir /tmp/mnt1 /tmp/mnt2 4)zpool import -f -R /tmp/mnt1 tank 5)zpool import -f -R /tmp/mnt2 rpool On 8/15/2011 9:12 AM, Stu Whitefish wrote: On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish swhitef...@yahoo.com wrote: # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I encourage you to open a support case and ask for an escalation on CR 7056738. -- Mike Gerdts Hi Mike, Unfortunately I don't have a support contract. I've been trying to set up a development system on Solaris and learn it. Until this happened, I was pretty happy with it. Even so, I don't have supported hardware so I couldn't buy a contract until I bought another machine and I really have enough machines so I cannot justify the expense right now. And I refuse to believe Oracle would hold people hostage in a situation like this, but I do believe they could generate a lot of goodwill by fixing this for me and whoever else it happened to and telling us what level of Solaris 10 this is fixed at so this doesn't continue happening. It's a pretty serious failure and I'm not the only one who it happened to. It's incredible but in all the years I have been using computers I don't ever recall losing data due to a filesystem or OS issue. That includes DOS, Windows, Linux, etc. I cannot believe ZFS on Intel is so fragile that people lose hundreds of gigs of data and that's just the way it is. There must be a way to recover this data and some advice on preventing it from happening again. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
On 8/15/2011 11:25 AM, Stu Whitefish wrote: Hi. Thanks I have tried this on update 8 and Sol 11 Express. The import always results in a kernel panic as shown in the picture. I did not try an alternate mountpoint though. Would it make that much difference? try it - Original Message - From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.laot...@gmail.com To: zfs-discuss@opensolaris.org Cc: Sent: Monday, August 15, 2011 3:06:20 PM Subject: Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible! may be try the following 1)boot s10u8 cd into single user mode (when boot cdrom, choose Solaris then choose single user mode(6)) 2)when ask to mount rpool just say no 3)mkdir /tmp/mnt1 /tmp/mnt2 4)zpool import -f -R /tmp/mnt1 tank 5)zpool import -f -R /tmp/mnt2 rpool On 8/15/2011 9:12 AM, Stu Whitefish wrote: On Thu, Aug 4, 2011 at 2:47 PM, Stuart James Whitefish swhitef...@yahoo.com wrote: # zpool import -f tank http://imageshack.us/photo/my-images/13/zfsimportfail.jpg/ I encourage you to open a support case and ask for an escalation on CR 7056738. -- Mike Gerdts Hi Mike, Unfortunately I don't have a support contract. I've been trying to set up a development system on Solaris and learn it. Until this happened, I was pretty happy with it. Even so, I don't have supported hardware so I couldn't buy a contract until I bought another machine and I really have enough machines so I cannot justify the expense right now. And I refuse to believe Oracle would hold people hostage in a situation like this, but I do believe they could generate a lot of goodwill by fixing this for me and whoever else it happened to and telling us what level of Solaris 10 this is fixed at so this doesn't continue happening. It's a pretty serious failure and I'm not the only one who it happened to. It's incredible but in all the years I have been using computers I don't ever recall losing data due to a filesystem or OS issue. That includes DOS, Windows, Linux, etc. I cannot believe ZFS on Intel is so fragile that people lose hundreds of gigs of data and that's just the way it is. There must be a way to recover this data and some advice on preventing it from happening again. Thanks, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scripting
hi most mordern server has separate ILOM that support IPMLtool that can talk to HDD what is your server? does it has separate remote management port? On 8/10/2011 8:36 AM, Lanky Doodle wrote: Hiya, Now I have figured out how to read disks using dd to make LEDs blink, I want to write a little script that iterates through all drives, dd's them with a few thousand counts, stop, then dd's them again with another few thousand counts, so I end up with maybe 5 blinks. I don't want somebody to write something for me, I'd like to be pointed in the right direction so I can build one myself :) Thanks attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool import -R /tpools zpool hangs
hi try to import zpool at different mnt root it hangs forever how to recover can one kill the import job 1 S root 5 0 0 0 SD? 0? Jun 27 ? 8:58 zpool-rootpool 1 S root 16786 0 0 0 SD? 0? 16:11:15 ? 0:00 zpool-as_as 0 S root 16866 16472 0 40 20? 1261? 16:13:09 pts/4 0:01 zpool import -R /tpools ora_as_arch 1 S root 16856 0 0 0 SD? 0? 16:12:57 ? 0:00 zpool-ora_asdb_new 1 S root 16860 0 0 0 SD? 0? 16:13:02 ? 0:00 zpool-as_wc_new 1 S root 16865 0 0 0 SD? 0? 16:13:09 ? 0:00 zpool-ora_ppl_arch 1 S root 16867 0 0 0 SD? 0? 16:13:11 ? 0:00 zpool-ora_as_arch 1 S root 16858 0 0 0 SD? 0? 16:12:59 ? 0:00 zpool-as_search_new 1 S root 16854 0 0 0 SD? 0? 16:12:55 ? 0:00 zpool-ora_as_arch_new 1 S root 16863 0 0 0 SD? 0? 16:13:06 ? 0:00 zpool-ora_herm_arch what are these other jobs? TIA attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Have my RMA... Now what??
yes good idea, another things to keep in mind technology change so fast, by the time you want a replacement, may be HDD does exist any more or the supplier changed, so the drives are not exactly like your original drive On 5/28/2011 6:05 PM, Michael DeMan wrote: Always pre-purchase one extra drive to have on hand. When you get it, confirm it was not dead-on-arrival by hooking up on an external USB to a workstation and running whatever your favorite tools are to validate it is okay. Then put it back in its original packaging, and put a label on it about what it is, and that it is a spare for box(s) XYZ disk system. When a drive fails, use that one off the shelf to do your replacement immediately then deal with the RMA, paperwork, and snailmail to get the bad drive replaced. Also, depending how many disks you have in your array - keeping multiple spares can be a good idea as well to cover another disk dying while waiting on that replacement one. In my opinion, the above goes whether you have your disk system configured with hot spare or not. And the technique is applicable to both personal/home-use and commercial uses if your data is important. - Mike On May 28, 2011, at 9:30 AM, Brian wrote: I have a raidz2 pool with one disk that seems to be going bad, several errors are noted in iostat. I have an RMA for the drive, however - no I am wondering how I proceed. I need to send the drive in and then they will send me one back. If I had the drive on hand, I could do a zpool replace. Do I do a zpool offline? zpool detach? Once I get the drive back and put it in the same drive bay.. Is it just a zpool replacedevice? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss attachment: laotsao.vcf___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss