[zfs-discuss] how to upgrade
Hi, zfs upgrade shows version as 4 and zpool upgrade shows version as 15. and etc/release show Solaris 10 10/09 s10s_u8wos_08a SPARC. And my zpool doen't have support for split. Could you please suggest me how to upgrade my Solaris box with latest version for zfs and zpool to get updated support. Thanks Regards, sridhar. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to upgrade
On 10/22/2010 1:51 AM, sridhar surampudi wrote: Hi, zfs upgrade shows version as 4 and zpool upgrade shows version as 15. and etc/release show Solaris 10 10/09 s10s_u8wos_08a SPARC. And my zpool doen't have support for split. Could you please suggest me how to upgrade my Solaris box with latest version for zfs and zpool to get updated support. Thanks Regards, sridhar. Solaris 10 Update 9 (09/10) supports the 'zpool split' command. You'll need to perform a upgrade install to get that version installed first. You'll need to upgrade any pool you plan to use the command on by doing this: # zpool upgrade poolname -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Shared LUN's and ZFS
Is it possible to have a shared LUN between 2 servers using zfs? The server can see both LUN's but when I do an impoer I get: bash-3.00# zpool import pool: logs id: 3700399958960377217 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: logs ONLINE c1t600144F0E849ECE34CC17CAB000Dd0 ONLINE bash-3.00# zpool import logs cannot import 'logs': pool may be in use from other system, it was last accessed by pbmaster1 (hostid: 0x84fabfb5) on Fri Oct 22 07:57:53 2010 use '-f' to import anyway When I do import it using the -f I can't see the files created on the other node. Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shared LUN's and ZFS
Hi Tony, Am 22.10.10 14:07, schrieb Tony MacDoodle: Is it possible to have a shared LUN between 2 servers using zfs? The server can see both LUN's but when I do an impoer I get: bash-3.00# zpool import pool: logs id: 3700399958960377217 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: logs ONLINE c1t600144F0E849ECE34CC17CAB000Dd0 ONLINE bash-3.00# zpool import logs cannot import 'logs': pool may be in use from other system, it was last accessed by pbmaster1 (hostid: 0x84fabfb5) on Fri Oct 22 07:57:53 2010 use '-f' to import anyway When I do import it using the -f I can't see the files created on the other node. ZFS is not a clustered file system, which can be mounted on several computers at the same time. Mounting a non-clustered file system on multiple nodes almost guarantees that you damage your file system immediately. So, don't do that! Cheers, budy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shared LUN's and ZFS
On Fri, October 22, 2010 08:07, Tony MacDoodle wrote: Is it possible to have a shared LUN between 2 servers using zfs? The server can see both LUN's but when I do an impoer I get: [...] When I do import it using the -f I can't see the files created on the other node. No, it is not possible. ZFS is not a clustered/shared file system. If you want that functionality on Solaris you'll need to get something like QFS: http://en.wikipedia.org/wiki/QFS Under Linux a good example would be: http://en.wikipedia.org/wiki/Global_File_System Many machines can see the LUNs, but a ZFS pool can only be mounted by one system at a time. Having multiple machines seeing the pools is handy for high availability fail over: http://en.wikipedia.org/wiki/Solaris_Cluster ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does a zvol use the zil?
re == Richard Elling richard.ell...@gmail.com writes: re The risk here is not really different that that faced by re normal disk drives which have nonvolatile buffers (eg re virtually all HDDs and some SSDs). This is why applications re can send cache flush commands when they need to ensure the re data is on the media. It's probably different because of the iSCSI target reboot problem I've written about before: iSCSI initiator iSCSI target nonvolatile medium write A - ack A write B - ack B --[A] [REBOOT] write C [timeout!] reconnect - ack Connected write C - ack C flush - [C] - ack Flush in the above time chart, the initiator thinks A, B, and C are written, but in fact only A and C are written. I regard this as a failing of imagination in the SCSI protocol, but probably with better understanding of the details than I have the initiator could be made to provably work around the problem. My guess has always been that no current initiators actually do, though. I think it could happen also with a directly-attached SATA disk if you remove power from the disk without rebooting the host, so as Richard said it is not really different, except that in the real world it's much more common for an iSCSI target to lose power without the initiator's also losing power than it is for a disk to lose power without its host adapter losing power. The ancient practice of unix filesystem design always considers cord-yanking as something happening to the entire machine, and failing disks are not the filesystem's responsibility to work arund because how could it? This assumption should have been changed and wasn't, when we entered the era of RAID and removable disks, where the connections to disks and disks themselves are both allowed to fail. However, when NFS was designed, the assumption *WAS* changed, and indeed NFSv2 and earlier operated always with the write cache OFF to be safe from this, just as COMSTAR does in its (default?) abyssmal-performance mode (so campuses bought prestoserve cards (equivalent to a DDRDrive except much less silly because they have onboard batteries), or auspex servers with included NVRAM, which are analagous outside the NFS world to netapp/hitachi/emc FC/iSCSI targets which always have big NVRAM's so they can leave the write cache off), and NFSv3 has a commit protocol that is smart enough to replay the 'write B' which makes the nonvolatile caches less necessary (so long as you're not closing files frequently, I guess?). I think it would be smart to design more storage systems so NFS can replace the role of iSCSI, for disk access. In Isilon or Lustre clusters this trick is common when a node can settle with unshared access to a subtree: create an image file on the NFS/Lustre back-end and fill it with an ext3 or XFS, and writes to that inner filesystem become much faster because this rube goldberg arrangement discards the clsoe-to-open consistency guarantee. We might use it in the ZFS world for actual physical disk acess instead of iSCSI, ex., it should be possible to NFS-export a zvol and see a share with a single file in it named 'theTarget' or something, but this file would be without read-ahead. Better yet, to accomodate VMWare limitations, would be to export a single fake /zvol share containing all NFS-shared zvol's, and as you export zvol's their files appear within this share. Also it should be possible to mount vdev elements over NFS without deadlocks---I know that is difficult, but VMWare does it. Perahps it cannot be done through the existing NFS client, but obviously it can be done somehow, and it would both solve the iSCSI target reboot problem and also allow using more kinds of proprietary storage backend---the same reasons VMWare wants to give admins a choice applies to ZFS. When NFS is used in this way the disk image file is never closed, so the NFS server will not need a slog to give good performance: the same job is accomplished by double-caching the uncommitted data on the client so it can be replayed if the time diagram above happens. pgp5D3EwpiIVp.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance issues with iSCSI under Linux
Some numbers... zpool status pool: Pool_sas state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM Pool_sas ONLINE 0 0 0 c4t5000C506A6D3d0 ONLINE 0 0 0 c4t5000C506A777d0 ONLINE 0 0 0 c4t5000C506AA43d0 ONLINE 0 0 0 c4t5000C506AC4Fd0 ONLINE 0 0 0 c4t5000C506AEF7d0 ONLINE 0 0 0 c4t5000C506B27Fd0 ONLINE 0 0 0 c4t5000C506B28Bd0 ONLINE 0 0 0 c4t5000C506B46Bd0 ONLINE 0 0 0 c4t5000C506B563d0 ONLINE 0 0 0 c4t5000C506B643d0 ONLINE 0 0 0 c4t5000C506B6D3d0 ONLINE 0 0 0 c4t5000C506BBE7d0 ONLINE 0 0 0 c4t5000C506C407d0 ONLINE 0 0 0 c4t5000C506C657d0 ONLINE 0 0 0 errors: No known data errors pool: Pool_test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM Pool_testONLINE 0 0 0 c4t5000C5002103F093d0 ONLINE 0 0 0 c4t5000C50021101683d0 ONLINE 0 0 0 c4t5000C50021102AA7d0 ONLINE 0 0 0 c4t5000C500211034D3d0 ONLINE 0 0 0 c4t5000C500211035DFd0 ONLINE 0 0 0 c4t5000C5002110480Fd0 ONLINE 0 0 0 c4t5000C50021104F0Fd0 ONLINE 0 0 0 c4t5000C50021119A43d0 ONLINE 0 0 0 c4t5000C5002112392Fd0 ONLINE 0 0 0 errors: No known data errors pool: syspool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM syspool ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 errors: No known data errors = Pool_sas is made of 14x 146G 15K SAS Drives in a big stripe. For this test there is no log device or cache. Connected to it is a RedHat box using iSCSI through an Intel X520 10GbE NIC. It runs several large MySQL queries at once- each taking minutes to compute. Pool_test is a stripe of 2TB SATA drives and a terrabyte of files is being copied to it for another box during this test. Here's the pastebin of iostat -xdn 10 on the Linux box: http://pastebin.com/431ESYaz Here's the pastebin of iostat -xdn 10 on the Nexenta box: http://pastebin.com/9g7KD3Ku Here's the pastebin zpool iostat -v 10 on the Nexenta box: http://pastebin.com/05fJL5sw From these numbers it looks like the Linux box is waiting for data all the time while the Nexenta box isn't pulling nearly as much throughput and IOPS as it could. Where is the bottleneck? One thing suspicious is that we notice a slow down of one pool when the other is under load. How can that be? Ian -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Running on Dell hardware?
On Wed, Oct 13 at 15:44, Edward Ned Harvey wrote: From: Henrik Johansen [mailto:hen...@scannet.dk] The 10g models are stable - especially the R905's are real workhorses. You would generally consider all your machines stable now? Can you easily pdsh to all those machines? kstat | grep current_cstate ; kstat | grep supported_max_cstates Dell T610, machine has been stable since we got it (relative to the failure modes you've mentioned) current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How does dedup work over iSCSI?
Folks, Let's say I have a volume being shared over iSCSI. The dedup has been turned on. Let's say I copy the same file twice under different names at the initiator end. Let's say each file ends up taking 5 blocks. For dedupe to work, each block for a file must match the corresponding block from the other file. Essentially, each pair of block being compared must have the same start location into the actual data. For a shared filesystem, ZFS may internally ensure that the block starts match. However, over iSCSI, the initiator does not even know about the whole block mechanism that zfs has. It is just sending raw bytes to the target. This makes me wonder if dedup actually works over iSCSI. Can someone please enlighten me on what I am missing? Thank you in advance for your help. Regards, Peter -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How does dedup work over iSCSI?
On 10/22/10 15:34, Peter Taps wrote: Folks, Let's say I have a volume being shared over iSCSI. The dedup has been turned on. Let's say I copy the same file twice under different names at the initiator end. Let's say each file ends up taking 5 blocks. For dedupe to work, each block for a file must match the corresponding block from the other file. Essentially, each pair of block being compared must have the same start location into the actual data. No, ZFS doesn't care about the file offset, just that the checksum of the blocks matches. For a shared filesystem, ZFS may internally ensure that the block starts match. However, over iSCSI, the initiator does not even know about the whole block mechanism that zfs has. It is just sending raw bytes to the target. This makes me wonder if dedup actually works over iSCSI. Can someone please enlighten me on what I am missing? Thank you in advance for your help. Regards, Peter ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How does dedup work over iSCSI?
Hi Neil, if the file offset does not match, the chances that the checksum would match, especially sha256, is almost 0. May be I am missing something. Let's say I have a file that contains 11 letters - ABCDEFGHIJK. Let's say the block size is 5. For the first file, the block contents are ABCDE, FGHIJ, and K. For the second file, let's say the blocks are ABCD, EFGHI, and JK. The chance that any checksum would match is very less. The chance that any checksum+verify would match is even less. Regards, Peter -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Running on Dell hardware?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi All, I'm currently considering purchasing 1 or 2 Dell R515's. With up to 14 drives, and up to 64GB of RAM, it seems like it's well suited for a low-end ZFS server. I know this box is new, but I wonder if anyone out there has any experience with it? How about the H700 SAS controller? Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I want to put some SSD's in a box like this, but there's no way I'm going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are they kidding? -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMwiMEAAoJEEADRM+bKN5w5IkH/AjOBKmnEUHIsSbW44Tmo94o 83kISEBx/hRYhLzNEpFYOW6IBD3pqYDGQP7da4ULMdPBINCWE6zcUT83BTct6O0D MSHJXacciOILIMMj6SM6+auvv9WloWwrbV/S+KsvkKoLxzhBafYkxZOEMJlkBwp1 Jpm/P3EoWpNLBasSHCCvKsGskZUDpIgVnzKrMkzXV6R5ROlgYlmFNPGlC/1kbL1Y 9DZrlKow0Ai0W5fCXjGSafZbzawa4SpBj02ES7CUQLvn45EhaRrSkneAM4dy1obo Oif4c1Nt2c0yV5xa1tc4i84Vd2iy9LR6g5C+1Hm3UqAKjcwPEEEUyAYhQpsKAIA= =DW76 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How does dedup work over iSCSI?
On 10/22/10 17:28, Peter Taps wrote: Hi Neil, if the file offset does not match, the chances that the checksum would match, especially sha256, is almost 0. May be I am missing something. Let's say I have a file that contains 11 letters - ABCDEFGHIJK. Let's say the block size is 5. For the first file, the block contents are ABCDE, FGHIJ, and K. For the second file, let's say the blocks are ABCD, EFGHI, and JK. The chance that any checksum would match is very less. The chance that any checksum+verify would match is even less. Regards, Peter The block size and contents has to match for ZFS dedup. See http://blogs.sun.com/bonwick/entry/zfs_dedup Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Changing vdev controller
I have a 14 drive pool, in a 2x 7 drive raidz2, with l2arc and slog devices attached. I had a port go bad on one of my controllers (both are sat2-mv8), so I need to replace it (I have no spare ports on either card). My spare controller is a LSI 1068 based 8 port card. My plan is to remove the l2arc and slog from the pool (to try and minimize any glitches), export the pool, change the controller, re-import and the add the l2arc and slog. Is that basically the correct process, or are there any tips for avoiding potential issues? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How does dedup work over iSCSI?
Neil Perrin wrote: On 10/22/10 15:34, Peter Taps wrote: Folks, Let's say I have a volume being shared over iSCSI. The dedup has been turned on. Let's say I copy the same file twice under different names at the initiator end. Let's say each file ends up taking 5 blocks. For dedupe to work, each block for a file must match the corresponding block from the other file. Essentially, each pair of block being compared must have the same start location into the actual data. No, ZFS doesn't care about the file offset, just that the checksum of the blocks matches. One conclusion is that one should be careful not to mess up file alignments when working with large files (like you might have in virtualization scenarios). I.e. if you have a bunch of virtual machine image clones, they'll dedupe quite well initially. However, if you then make seemingly minor changes inside some of those clones (like changing their partition offsets to do 1mb alignment), you'll lose most or all of the dedupe benefits. General purpose compression tends to be less susceptible to changes in data offsets but also has its limits based on algorithm and dictionary size. I think dedupe can be viewed as a special-case of compression that happens to work quite well for certain workloads when given ample hardware resources (compared to what would be needed to run without dedupe). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance issues with iSCSI under Linux
One thing suspicious is that we notice a slow down of one pool when the other is under load. How can that be? Ian A network switch that is being maxed out? Some switches cannot switch at rated line speed on all their ports all at the same time. Their internal buses simply don't have the bandwidth needed for that. Maybe you are running into that limit? (I know you mentioned bypassing the switch completely in some other tests and not noticing any difference.) Any other hardware in common? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Newbie ZFS Question: RAM for Dedup
Never Best wrote: Sorry I couldn't find this anywhere yet. For deduping it is best to have the lookup table in RAM, but I wasn't too sure how much RAM is suggested? ::Assuming 128KB Block Sizes, and 100% unique data: 1TB*1024*1024*1024/128 = 8388608 Blocks ::Each Block needs 8 byte pointer? 8388608*8 = 67108864 bytes ::Ram suggest per TB 67108864/1024/1024 = 64MB So if I understand correctly we should have a min of 64MB RAM per TB for deduping? *hopes my math wasn't way off*, or is there significant extra overhead stored per block for the lookup table? For example is there some kind of redundancy on the lookup table (relation to RAM space requirments) to counter corruption? I read some articles and they all mention that there is significant performance loss if the table isn't in RAM, but none really mentioned how much RAM one should have per TB of duping. Thanks, hope someone can confirm *or give me the real numbers* for me. I know blocksize is variable; I'm most interessted in the default zfs setup right now. There were several detailed discussions about this over the past 6 months that should be in the archives. I believe most of the info came from Richard Elling. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vdev failure - pool loss ?
Bob Friesenhahn wrote: On Tue, 19 Oct 2010, Cindy Swearingen wrote: unless you use copies=2 or 3, in which case your data is still safe for those datasets that have this option set. This advice is a little too optimistic. Increasing the copies property value on datasets might help in some failure scenarios, but probably not in more catastrophic failures, such as multiple device or hardware failures. It is 100% too optimistic. The copies option only duplicates the user data. While zfs already duplicates the metadata (regardless of copies setting), it is not designed to function if a vdev fails. Bob Some future filesystem (not zfs as currently implemented) could be designed to handle certain vdev failures where multiple vdevs were used without redundancy at the vdev level. In this scenario, the redundant metadata and user data with copies=2+ would still be accessible by virtue of it having been spread across the vdevs, with at least one copy surviving. Expanding upon this design would allow raw space to be added, with redundancy being set by a 'copies' parameter. I understand the copies parameter to currently be designed and intended as an extra assurance against failures that affect single blocks but not whole devices. I.e. run ZFS on a laptop with a single hard drive, and use copies=2 to protect against bad sectors but not complete drive failures. I have not tested this, however I imagine that performance is the reason to use copies=2 instead of partitioning/slicing the drive into two halves and mirroring the two halves back together. I also recall seeing something about the copies parameter attempting to spread the copies across different devices, as much as possible. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Running on Dell hardware?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Kyle McDonald I'm currently considering purchasing 1 or 2 Dell R515's. With up to 14 drives, and up to 64GB of RAM, it seems like it's well suited for a low-end ZFS server. I know this box is new, but I wonder if anyone out there has any experience with it? How about the H700 SAS controller? Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I want to put some SSD's in a box like this, but there's no way I'm going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are they kidding? You are asking for a world of hurt. You may luck out, and it may work great, thus saving you money. Take my example for example ... I took the safe approach (as far as any non-sun hardware is concerned.) I bought an officially supported dell server, with all dell blessed and solaris supported components, with support contracts on both the hardware and software, fully patched and updated on all fronts, and I am getting system failures approx once per week. I have support tickets open with both dell and oracle right now ... Have no idea how it's all going to turn out. But if you have a problem like mine, using unsupported hardware, you have no alternative. You're up a tree full of bees, naked, with a hunter on the ground trying to shoot you. And IMHO, I think the probability of having a problem like mine is higher when you use the unsupported hardware. But of course there's no definable way to quantize that belief. My advice to you is: buy the supported hardware, and the support contracts for both the hardware and software. But of course, that's all just a calculated risk, and I doubt you're going to take my advice. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance issues with iSCSI under Linux
On Fri, Oct 22, 2010 at 10:40 PM, Haudy Kazemi kaze0...@umn.edu wrote: One thing suspicious is that we notice a slow down of one pool when the other is under load. How can that be? Ian A network switch that is being maxed out? Some switches cannot switch at rated line speed on all their ports all at the same time. Their internal buses simply don't have the bandwidth needed for that. Maybe you are running into that limit? (I know you mentioned bypassing the switch completely in some other tests and not noticing any difference.) Any other hardware in common? There's almost 0 chance a switch is being overrun by a single gigE connection. The worst switch I've seen is roughly 8:1 oversubscribed. You'd have to be maxing out many, many ports for a switch to be a problem. Likely you don't have enough ram or CPU in the box. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Running on Dell hardware?
On Fri, Oct 22, 2010 at 10:53 PM, Edward Ned Harvey sh...@nedharvey.comwrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Kyle McDonald I'm currently considering purchasing 1 or 2 Dell R515's. With up to 14 drives, and up to 64GB of RAM, it seems like it's well suited for a low-end ZFS server. I know this box is new, but I wonder if anyone out there has any experience with it? How about the H700 SAS controller? Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I want to put some SSD's in a box like this, but there's no way I'm going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are they kidding? You are asking for a world of hurt. You may luck out, and it may work great, thus saving you money. Take my example for example ... I took the safe approach (as far as any non-sun hardware is concerned.) I bought an officially supported dell server, with all dell blessed and solaris supported components, with support contracts on both the hardware and software, fully patched and updated on all fronts, and I am getting system failures approx once per week. I have support tickets open with both dell and oracle right now ... Have no idea how it's all going to turn out. But if you have a problem like mine, using unsupported hardware, you have no alternative. You're up a tree full of bees, naked, with a hunter on the ground trying to shoot you. And IMHO, I think the probability of having a problem like mine is higher when you use the unsupported hardware. But of course there's no definable way to quantize that belief. My advice to you is: buy the supported hardware, and the support contracts for both the hardware and software. But of course, that's all just a calculated risk, and I doubt you're going to take my advice. ;-) Dell requires Dell branded drives as of roughly 8 months ago. I don't think there was ever an H700 firmware released that didn't require this. I'd bet you're going to waste a lot of money to get a drive the system refuses to recognize. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Changing vdev controller
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Dave I have a 14 drive pool, in a 2x 7 drive raidz2, with l2arc and slog devices attached. I had a port go bad on one of my controllers (both are sat2-mv8), so I need to replace it (I have no spare ports on either card). My spare controller is a LSI 1068 based 8 port card. My plan is to remove the l2arc and slog from the pool (to try and minimize any glitches), export the pool, change the controller, re- import and the add the l2arc and slog. Is that basically the correct process, or are there any tips for avoiding potential issues? You really don't need to do that. You can just export (or shutdown)... swap controllers, and bring it up again. No need to remove the l2arc or slog. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance issues with iSCSI under Linux
What more info could you provide? Quite a lot more, actually, like: how many streams of SQL and copy are you running? how are the filesystems/zvols configured (recordsize, etc)? some CPU, VM and network stats would also be nice. Based on the nexenta iostats you've provided (a tiny window on what's happening), it appears that you have an 8k recordsize for SQL. If you add up all the IOPS for the SQL, it's roughly 2000 reads at around 3ms each. Which might indicate at least 6 reads outstanding at any time. So how many queries do you have running in parallel? If you add more, I'd expect the service times to increase. 3ms isn't much for spinning rust, but isn't this why you are planning to use lots of L2ARC? Could be a similar story on writes. How many parallel streams? How many files? What's the average file size? What's the client filesysyem? How much does it sync to the server? Could it be that your client apps are always waiting for the spinning rust? Does an SSD log make any difference on this pool? Sent from my iPhone On 22 Oct 2010, at 19:57, Ian D rewar...@hotmail.com wrote: Some numbers... zpool status pool: Pool_sas state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM Pool_sas ONLINE 0 0 0 c4t5000C506A6D3d0 ONLINE 0 0 0 c4t5000C506A777d0 ONLINE 0 0 0 c4t5000C506AA43d0 ONLINE 0 0 0 c4t5000C506AC4Fd0 ONLINE 0 0 0 c4t5000C506AEF7d0 ONLINE 0 0 0 c4t5000C506B27Fd0 ONLINE 0 0 0 c4t5000C506B28Bd0 ONLINE 0 0 0 c4t5000C506B46Bd0 ONLINE 0 0 0 c4t5000C506B563d0 ONLINE 0 0 0 c4t5000C506B643d0 ONLINE 0 0 0 c4t5000C506B6D3d0 ONLINE 0 0 0 c4t5000C506BBE7d0 ONLINE 0 0 0 c4t5000C506C407d0 ONLINE 0 0 0 c4t5000C506C657d0 ONLINE 0 0 0 errors: No known data errors pool: Pool_test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM Pool_testONLINE 0 0 0 c4t5000C5002103F093d0 ONLINE 0 0 0 c4t5000C50021101683d0 ONLINE 0 0 0 c4t5000C50021102AA7d0 ONLINE 0 0 0 c4t5000C500211034D3d0 ONLINE 0 0 0 c4t5000C500211035DFd0 ONLINE 0 0 0 c4t5000C5002110480Fd0 ONLINE 0 0 0 c4t5000C50021104F0Fd0 ONLINE 0 0 0 c4t5000C50021119A43d0 ONLINE 0 0 0 c4t5000C5002112392Fd0 ONLINE 0 0 0 errors: No known data errors pool: syspool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM syspool ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 errors: No known data errors = Pool_sas is made of 14x 146G 15K SAS Drives in a big stripe. For this test there is no log device or cache. Connected to it is a RedHat box using iSCSI through an Intel X520 10GbE NIC. It runs several large MySQL queries at once- each taking minutes to compute. Pool_test is a stripe of 2TB SATA drives and a terrabyte of files is being copied to it for another box during this test. Here's the pastebin of iostat -xdn 10 on the Linux box: http://pastebin.com/431ESYaz Here's the pastebin of iostat -xdn 10 on the Nexenta box: http://pastebin.com/9g7KD3Ku Here's the pastebin zpool iostat -v 10 on the Nexenta box: http://pastebin.com/05fJL5sw From these numbers it looks like the Linux box is waiting for data all the time while the Nexenta box isn't pulling nearly as much throughput and IOPS as it could. Where is the bottleneck? One thing suspicious is that we notice a slow down of one pool when the other is under load. How can that be? Ian -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance issues with iSCSI under Linux
Tim Cook wrote: On Fri, Oct 22, 2010 at 10:40 PM, Haudy Kazemi kaze0...@umn.edu mailto:kaze0...@umn.edu wrote: One thing suspicious is that we notice a slow down of one pool when the other is under load. How can that be? Ian A network switch that is being maxed out? Some switches cannot switch at rated line speed on all their ports all at the same time. Their internal buses simply don't have the bandwidth needed for that. Maybe you are running into that limit? (I know you mentioned bypassing the switch completely in some other tests and not noticing any difference.) Any other hardware in common? There's almost 0 chance a switch is being overrun by a single gigE connection. The worst switch I've seen is roughly 8:1 oversubscribed. You'd have to be maxing out many, many ports for a switch to be a problem. Likely you don't have enough ram or CPU in the box. --Tim I agree, but also trying not to assume anything. Looking back, Ian's first email said '10GbE on a dedicated switch'. I don't think the switch model was ever identified...perhaps it is a 1 GbE switch with a few 10 GbE ports? (Drawing at straws.) What happens when Windows is the iSCSI initiator connecting to an iSCSI target on ZFS? If that is also slow, the issue is likely not in Windows or in Linux. Do CIFS shares (connected to from Linux and from Windows) show the same performance problems as iSCSI and NFS? If yes, this would suggest a common cause item on the ZFS side. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss