[zfs-discuss] RAID Z stripes
I am wanting to build a server with 16 - 1TB drives with 2 8 drive RAID Z2 arrays striped together. However, I would like the capability of adding additional stripes of 2TB drives in the future. Will this be a problem? I thought I read it is best to keep the stripes the same width and was planning to do that, but I was wondering about using drives of different sizes. These drives would all be in a single pool. -- Terry Hull Network Resource Group, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAID Z stripes
From: Phil Harman phil.har...@gmail.com Date: Tue, 10 Aug 2010 09:24:52 +0100 To: Ian Collins i...@ianshome.com Cc: Terry Hull t...@nrg-inc.com, zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] RAID Z stripes On 10 Aug 2010, at 08:49, Ian Collins i...@ianshome.com wrote: On 08/10/10 06:21 PM, Terry Hull wrote: I am wanting to build a server with 16 - 1TB drives with 2 8 dri ve RAID Z2 arrays striped together. However, I would like the capa bility of adding additional stripes of 2TB drives in the future. W ill this be a problem? I thought I read it is best to keep the str ipes the same width and was planning to do that, but I was wonderi ng about using drives of different sizes. These drives would all b e in a single pool. It would work, but you run the risk of the smaller drives becoming full and all new writes doing to the bigger vdev. So while usable, performance would suffer. Almost by definition, the 1TB drives are likely to be getting full when the new drives are added (presumably because of running out of space). Performance can only be said to suffer relative to a new pool built entirely with drives of the same size. Even if he added 8x 2TB drives in a RAIDZ3 config it is hard to predict what the performance gap will be (on the one hand: RAIDZ3 vs RAIDZ2, on the other: an empty group vs an almost full, presumably fragmented, group). One option would be to add 2TB drives as 5 drive raidz3 vdevs. That way your vdevs would be approximately the same size and you would have the optimum redundancy for the 2TB drives. I think you meant 6, but I don't see a good reason for matching the group sizes. I'm for RAIDZ3, but I don't see much logic in mixing groups of 6+2 x 1TB and 3+3 x 2TB in the same pool (in one group I appear to care most about maximising space, in the other I'm maximising availability) The other issue is that of hot spares. In a pool of mixed size drives you either waste array slots (by having spares of different sizes) or plan to have unavailable space when small drives are replaced by large ones. So do I understand correctly that really the Right thing to do is to build a pool not only with a consistent strip width, but also to build it with drives on only one size? It also sounds like from a practical point of view that building the pool full-sized is the best policy so that the data can be spread relatively uniformly across all the drives from the very beginning. In my case, I think what I will do is to start with the 16 drives in a single pool and when I need more space, I'll create a new pool and manually move the some of the existing data to the new pool to spread the IO load. The other issue here seems to be RAIDZ2 vs RAIDZ3. I assume there is not a significant performance difference between the two for most loads, but rather I choose between them based on how badly I want the array to stay intact. - Terry ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS
From: Geoff Nordli geo...@gnaa.net Date: Sat, 7 Aug 2010 08:39:46 -0700 To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS -Original Message- From: Brian Hechinger [mailto:wo...@4amlunch.net] Sent: Saturday, August 07, 2010 8:10 AM To: Geoff Nordli Subject: Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS On Sat, Aug 07, 2010 at 08:00:11AM -0700, Geoff Nordli wrote: Anyone have any experience with a R510 with the PERC H200/H700 controller with ZFS? Not that particular setup, but I do run Solaris on a Precision 690 with PERC 6i controllers. My perception is that Dell doesn't play well with OpenSolaris. What makes you say that? I've run Solaris on quite a few Dell boxes and have never had any issues. -brian -- Hi Brian. I am glad to hear that, because I would prefer to use a dell box. Is there a JBOD mode with the PERC 6i? It is funny how sometimes one forms these views as you gather information. Geoff It is just that lots of the PERC controllers do not do JBOD very well. I've done it several times making a RAID 0 for each drive. Unfortunately, that means the server has lots of RAID hardware that is not utilized very well. Also, ZFS loves to see lots of spindles, and Dell boxes tend not to have lots of drive bays in comparison to what you can build at a given price point. Of course then you have warranty / service issues to consider. -- Terry Hull Network Resource Group, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS
From: Geoff Nordli geo...@gnaa.net Date: Sat, 7 Aug 2010 14:11:37 -0700 To: Terry Hull t...@nrg-inc.com, zfs-discuss@opensolaris.org Subject: RE: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS [stuff deleted] Terry, you are right, the part that was really missing with the Dell was the lack of spindles. It seems the R510 can have up to 12 spindles. The online configurator only allows you to select SLC SSDs, which are a lot more expensive than the MLC versions. It would be nice to do MLC since that works fine for L2ARC. I believe they have an onboard SD flash connector too. It would be great to be able to install the base OS onto a flash card and not waste two drives. Are you using the Broadcom or Intel NICs? For sure the benefit of buying name brand is the warranty/service side of things, which is important to me. I don't want to spend any time worrying/fixing boxes. I understand that one. I have been using both Intel and Broadcom NICs successfully. My gut tells me I like the Intel better, but I can't say that is because I have had trouble with the Broadcom. It is just a personal preference that I probably can't justify. -- Terry ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS / Network tuning recommendations for iSCSI
I have seen some network recommendations for tuning a small storage server from the network side. I am currently using this set and wondered if there were other things I should be tweeking, ndd -set /dev/tcp tcp_xmit_hiwat1048576 ndd -set /dev/tcp tcp_recv_hiwat8388608 ndd -set /dev/tcp tcp_max_buf 8388608 ndd -set /dev/udp udp_xmit_hiwat1048576 ndd -set /dev/udp udp_recv_hiwat8388608 ndd -set /dev/tcp tcp_conn_req_max_q65536 ndd -set /dev/tcp tcp_conn_req_max_q0 65536 ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500 ndd -set /dev/tcp tcp_naglim_def1 I realize the UDP options have nothing to do with iSCSI, but I applied them anyway as it seemed to make sense. The machine I¹m using has a 8 drive RAIDZ-2 with 1 TB drives a quad core processor and 4 GB RAM. 98% of its load is as an iSCSI target for ESX.I do have write caching turned on and have verified that turning it off causes a significant write performance penalty. I currently am not using bonded NICS, but am using jumbo frames. Are there other things I should be tweeking? -- Terry Hull Network Resource Group, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Logical Units and ZFS send / receive
I have a logical unit created with sbdadm create-lu that it I replicating with zfs send / receive between 2 build 134 hosts. The these LUs are iSCSI targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 machine. The zfs pool names are the same on both machines. The replication seems to be going correctly. However, when I try to use the LUs on the server I am replicating the data to, I have issues. Here is the scenario: The LUs are created as sparse. Here is the process I¹m going through after the snapshots are replicated to a secondary machine: * Original machine: svccfg export -a stmf /tmp/stmf.cfg * Copy stmf.cfg to second machine: * Secondary machine: svcadm disable stmf * svccfg delete xtmf * cd /var/svc/manifest * svccfg import system/stmf.xml * svcadm disable stmf * svcadm import /tmp/stmf.cfg At this point stmfadm list-lu v shows the SCSI LUs all as ³unregistered² When I try to import the LUs I get: stmfadm: meta data error I am using the command: stmfadm import-lu /dev/zvol/rdsk/pool-name to import the LU It is as if the pool does not exist. However, I can verify that the pool does actually exist with zfs list and with zfs list t snapshot to show the snapshot that I replicated. Any suggestions? -- Terry Hull Network Resource Group, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Logical Units and ZFS send / receive
From: Richard Elling rich...@nexenta.com Date: Wed, 4 Aug 2010 11:05:21 -0700 Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive On Aug 3, 2010, at 11:58 PM, Terry Hull wrote: I have a logical unit created with sbdadm create-lu that it I replicating with zfs send / receive between 2 build 134 hosts. The these LUs are iSCSI targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 machine. The zfs pool names are the same on both machines. The replication seems to be going correctly. However, when I try to use the LUs on the server I am replicating the data to, I have issues. Here is the scenario: The LUs are created as sparse. Here is the process I¹m going through after the snapshots are replicated to a secondary machine: How did you replicate? In b134, the COMSTAR metadata is placed in hidden parameters in the dataset. These are not transferred via zfs send, by default. This metadata includes the LU. -- richard Does the -p option on the zfs send solve that problem? What else is not sent by default? In other words, am I better off sending the metadata with the zfs send, or am I better off just creating the GUID once I get the data transferred? -- Terry Hull Network Resource Group, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Logical Units and ZFS send / receive
From: Richard Elling rich...@nexenta.com Date: Wed, 4 Aug 2010 18:40:49 -0700 To: Terry Hull t...@nrg-inc.com Cc: zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive On Aug 4, 2010, at 1:27 PM, Terry Hull wrote: From: Richard Elling rich...@nexenta.com Date: Wed, 4 Aug 2010 11:05:21 -0700 Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive On Aug 3, 2010, at 11:58 PM, Terry Hull wrote: I have a logical unit created with sbdadm create-lu that it I replicating with zfs send / receive between 2 build 134 hosts. The these LUs are iSCSI targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 machine. The zfs pool names are the same on both machines. The replication seems to be going correctly. However, when I try to use the LUs on the server I am replicating the data to, I have issues. Here is the scenario: The LUs are created as sparse. Here is the process I¹m going through after the snapshots are replicated to a secondary machine: How did you replicate? In b134, the COMSTAR metadata is placed in hidden parameters in the dataset. These are not transferred via zfs send, by default. This metadata includes the LU. -- richard Does the -p option on the zfs send solve that problem? I am unaware of a zfs send -p option. Did you mean the -R option? The LU metadata is stored in the stmf_sbd_lu property. You should be able to get/set it. On the source machine I did a zfs get -H stmf_sbd_lu pool-name. In my case that gave me tank/iscsi/bg-man5-vmfs stmf_sbd_lu 554c4442534e555307020702 010001843000b7010100ff862005 00c01200 180009fff1030010600144f0fa354000 4c4f9edb0003 7461 6e6b2f69736373692f62672d6d616e352d766d6673002f6465762f7a766f6c2f7264736b2f74 616e6b2f69736373692f62672d6d616e352d766d667300e70100 002200ff080 local (But it was all one line.) I cut the numeric section out above and then did a Zfs set stmf_sbd_lu=(above cut section) pool_name And that seemed to work. However, when I did a stmfadm import_lu /dev/zvol/rdsk/pool I still get meta file error However, when I do a zfs get -H stmf_sbd_lu pool_name on the secondary system, it now matches the results on the first system. BTW: The zfs send -p option is described as Send Properties It seems like this should not be so hard to transfer an LU with zfs send/receive. What else is not sent by default? In other words, am I better off sending the metadata with the zfs send, or am I better off just creating the GUID once I get the data transferred? I don't think this is a GUID issue. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com -- Terry Hull ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS mirrored boot disks
Interestingly, with the machine running, I can pull the first drive in the mirror, replace it with an unformatted one, format it, mirror rpool over to it, install the boot loader, and at that point the machine will boot with no problems. It s just when the first disk is missing that I have a problem with it. -- Terry -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS mirrored boot disks
I have a machine with the Supermicro 8 port SATA card installed. I have had no problem creating a mirrored boot disk using the oft-repeated scheme: prtvtoc /dev/rdsk/c4t0d0s2 | fmthard -s – /dev/rdsk/c4t1d0s2 zpool attach rpool c4t0d0s0 c4t1d0s0 wait for sync installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4t1d0s0 Unfortunately when I shut the machine down and remove the primary boot disk, it will no longer boot. I get the boot loader, and if I turn off the splash screen I see it get to the point of displaying the host name. At that point, it hangs forever. From the posts I've seen it looks like this is a very standard scheme that just works. What can be missing with my procedure. I am running Build 132, if that matters. -- Terry -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS replication primary secondary
First of all, I must apologize. I'm an OpenSolaris newbie so please don't be too hard on me. Sorry if this has been beaten to death before, but I could not find it, so here goes. I'm wanting to be able to have two disk servers that I replicate data between using send / receive with snapshots. Yes, I know that is simple enough, but what happens when the primary server goes down and I actually need to make changes to the secondary? Can I then replicate the data back to the primary server without starting over?TIA. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS replication primary secondary
Thanks for the info. If that last common snapshot gets destroyed on the primary server, it is then a full replication back to the primary server. Is that correct? -- Terry -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss