Re: [zfs-discuss] Thumper Origins Q
On Wed, Jan 24, 2007 at 12:15:21AM -0700, Jason J. W. Williams wrote: Wow. That's an incredibly cool story. Thank you for sharing it! Does the Thumper today pretty much resemble what you saw then? Yes, amazingly so: 4-way, 48 spindles, 4u. The real beauty of the match between ZFS and Thumper was (and is) that ZFS unlocks new economics in storage -- smart software achieving high performance and ultra-high reliability with dense, cheap hardware -- and that Thumper was (and is) the physical embodiment of those economics. And without giving away too much of our future roadmap, suffice it to say that one should expect much, much more from Sun in this vein: innovative software and innovative hardware working together to deliver world-beating systems with undeniable economics. And actually, as long as we're talking history, you might be interested to know the story behind the name Thumper: Fowler initially suggested the name as something of a joke, but, as often happens with Fowler, he tells a joke with a straight face once too many to one person too many, and next thing you know it's the plan of record. I had suggested the name Humper for the server that became Andromeda (the x8000 series) -- so you could order a datacenter by asking for (say) two Humpers and five Thumpers. (And I loved the idea of asking would you like a Humper for your Thumper?) But Fowler said the name was too risque (!). Fortunately the name Thumper stuck... - Bryan -- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
Rainer Heilke wrote: For the clone another system zfs send/recv might be useful Keeping in mind that you only want to send/recv one half of the ZFS mirror... Huh ? That doesn't make any sense. You can't send half a mirror. When you are running zfs send it is a read and ZFS will read the data from all available mirrors to help performance. When it is zfs recv it will write to all sides of the mirror on the destination. What are you actually trying to say here ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
Actually, it was meant to hold the entire electronic transcript of the George Bush impeachment proceedings ... we were thinking ahead. Fortunately, larger disks became available in time. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
On 24/1/07 9:06, Bryan Cantrill [EMAIL PROTECTED] wrote: But Fowler said the name was too risque (!). Fortunately the name Thumper stuck... I assumed it was a reference to Bambi... That's what comes from having small children :-) Cheers, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
Chris, well, Thumper is actually a reference to Bambi The comment about being risque was refering to Humper as a codename proposed for a related server ( and e.g. leo.org confirms that is has a meaning labelled as [vulg.] :-) -- Roland Chris Ridd schrieb: On 24/1/07 9:06, Bryan Cantrill [EMAIL PROTECTED] wrote: But Fowler said the name was too risque (!). Fortunately the name Thumper stuck... I assumed it was a reference to Bambi... That's what comes from having small children :-) Cheers, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS direct IO
[EMAIL PROTECTED] writes: Note also that for most applications, the size of their IO operations would often not match the current page size of the buffer, causing additional performance and scalability issues. Thanks for mentioning this, I forgot about it. Since ZFS's default block size is configured to be larger than a page, the application would have to issue page-aligned block-sized I/Os. Anyone adjusting the block size would presumably be responsible for ensuring that the new size is a multiple of the page size. (If they would want Direct I/O to work...) I believe UFS also has a similar requirement, but I've been wrong before. I believe the UFS requirement is that the I/O be sector aligned for DIO to be attempted. And Anton did mention that one of the benefit of DIO is the ability to direct-read a subpage block. Without UFS/DIO the OS is required to read and cache the full page and the extra amount of I/O may lead to data channel saturation (I don't see latency as an issue in here, right ?). This is where I said that such a feature would translate for ZFS into the ability to read parts of a filesystem block which would only make sense if checksums are disabled. And for RAID-Z that could mean avoiding I/Os to each disks but one in a group, so that's a nice benefit. So for the performance minded customer that can't afford mirroring, is not much a fan of data integrity, that needs to do subblock reads to an uncacheable workload, then I can see a feature popping up. And this feature is independant on whether or not the data is DMA'ed straight into the user buffer. The other feature, is to avoid a bcopy by DMAing full filesystem block reads straight into user buffer (and verify checksum after). The I/O is high latency, bcopy adds a small amount. The kernel memory can be freed/reuse straight after the user read completes. This is where I ask, how much CPU is lost to the bcopy in workloads that benefit from DIO ? At this point, there are lots of projects that will lead to performance improvements. The DIO benefits seems like small change in the context of ZFS. The quickest return on investement I see for the directio hint would be to tell ZFS to not grow the ARC when servicing such requests. -r -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] panic with zfs
Hello, We're setting up a new mailserver infrastructure and decided, to run it on zfs. On a E220R with a D1000, I've setup a storage pool with four mirrors: -- [EMAIL PROTECTED] # zpool status pool: pool0 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool0ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c5t8d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c5t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c5t10d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c5t11d0 ONLINE 0 0 0 errors: No known data errors -- Before we start to install any software on it, we've got to the idea, to see how zfs behaves when something goes wrong. So we pulled out a disk, while a mkfile was running. What happened then, was not that, what we expected. The system was hanging for more than an hour and finally it paniced: -- Jan 23 18:49:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0: got SCSI bus reset Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 18:50:36 newponitCmd (0x6a3ed10) dump for Target 1 Lun 0: Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 18:50:36 newponitcdb=[ 0x2a 0x0 0x2 0x1b 0x2c 0x93 0x0 0x0 0x1 0x0 ] Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 18:50:36 newponitpkt_flags=0xc000 pkt_statistics=0x60 pkt_state=0x7 Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 18:50:36 newponitpkt_scbp=0x0 cmd_flags=0x1860 Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 18:50:36 newponitDisconnected tagged cmd(s) (1) timeout for Target 1.0 Jan 23 18:50:36 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still available Jan 23 18:50:36 newponit genunix: [ID 611667 kern.info] NOTICE: glm0: Disconnected tagged cmd(s) (1) timeout for Target 1.0 Jan 23 18:50:36 newponit glm: [ID 401478 kern.warning] WARNING: ID[SUNWpd.glm.cmd_timeout.6018] Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 18:50:36 newponitgot SCSI bus reset Jan 23 18:50:36 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still available Jan 23 18:50:36 newponit genunix: [ID 611667 kern.info] NOTICE: glm0: got SCSI bus reset Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd0): Jan 23 18:50:36 newponitSCSI transport failed: reason 'timeout': giving up Jan 23 18:50:36 newponit md: [ID 312844 kern.warning] WARNING: md: state database commit failed Jan 23 18:50:36 newponit last message repeated 1 time Jan 23 18:51:38 newponit unix: [ID 836849 kern.notice] Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600: Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to lack of DiskSuite state Jan 23 18:51:38 newponit database replicas. Fewer than 50% of the total were available, Jan 23 18:51:38 newponit so panic to ensure data integrity. Jan 23 18:51:38 newponit unix: [ID 10 kern.notice] Jan 23 18:51:38 newponit genunix: [ID 723222 kern.notice] 02a1003c1230 md:mddb_commitrec_wrapper+a8 (a, 3e81600, 18e9250, 12ecc00, 18e9000, 1) Jan 23 18:51:38 newponit genunix: [ID 179002 kern.notice] %l0-3: 0030 0002 06a8e6c8 Jan 23 18:51:38 newponit %l4-7: 012ecf48 0002 012ecc00 Jan 23 18:51:39 newponit genunix: [ID 723222 kern.notice] 02a1003c12e0 md_mirror:mirror_mark_resync_region+290 (0, 0, 68dacc0, 68da980, 0, 1) Jan 23 18:51:39 newponit genunix: [ID 179002 kern.notice] %l0-3: 068e9e80 0001 Jan 23 18:51:39 newponit %l4-7: 0001 0183d400 0002 Jan 23 18:51:39 newponit genunix: [ID 723222 kern.notice] 02a1003c1390 md_mirror:mirror_write_strategy+5c0 (6885108, 0, 0, 0, 68dad20, 0) Jan 23 18:51:39 newponit genunix: [ID 179002 kern.notice] %l0-3: 030c33b8
Re: [zfs-discuss] panic with zfs
Ihsan Dogan wrote: Hello, We're setting up a new mailserver infrastructure and decided, to run it on zfs. On a E220R with a D1000, I've setup a storage pool with four mirrors: -- [EMAIL PROTECTED] # zpool status pool: pool0 state: ONLINE scrub: none requested config: [...] Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600: Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to lack of DiskSuite state Jan 23 18:51:38 newponit database replicas. Fewer than 50% of the total were available, Jan 23 18:51:38 newponit so panic to ensure data integrity. this message shows (and the rest of the stack prove) that your panic happened in SVM. It has NOTHING to do with zfs. So either you pulled the wrong disk, or the disk you pulled also contained SVM volumes (next to ZFS). -- Michael SchusterSun Microsystems, Inc. Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Hello, We're setting up a new mailserver infrastructure and decided, to run it on zfs. On a E220R with a D1000, I've setup a storage pool with four mirrors: Good morning Ihsan ... I see that you have everything mirrored here, thats excellent. When you pulled a disk, was it a disk that was containing a metadevice or was it a disk in the zpool ? In the case of a metadevice, as you know, the system should have kept running fine. We have probably both done this over and over at various sites to demonstrate SVM to people. If you pulled out a device in the zpool, well now we are in a whole new world and I had heard that there was some *feature* in Solaris now that will protect the ZFS file system integrity by simply causing a system to panic if the last device in some redundant component was compromised. I think you hit a major bug in ZFS personally. Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Afternoon, The panic looks due to the fact that your SVM state databases aren't all there, so when we came to update one of them we found there was = 50% of the state databases and crashed. This doesn't look like anything to do with ZFS. I'd check the output from metadb and see if it looks like you've got a SVM database on a disk that's also in use by ZFS. Jan 23 18:50:36 newponit SCSI transport failed: reason 'timeout': giving up Jan 23 18:50:36 newponit md: [ID 312844 kern.warning] WARNING: md: state database commit failed Jan 23 18:50:36 newponit last message repeated 1 time Jan 23 18:51:38 newponit unix: [ID 836849 kern.notice] Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600: Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to lack of DiskSuite state Jan 23 18:51:38 newponit database replicas. Fewer than 50% of the total were available, Jan 23 18:51:38 newponit so panic to ensure data integrity. Regards, Jason ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Hello Michael, Am 24.1.2007 14:36 Uhr, Michael Schuster schrieb: -- [EMAIL PROTECTED] # zpool status pool: pool0 state: ONLINE scrub: none requested config: [...] Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600: Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to lack of DiskSuite state Jan 23 18:51:38 newponit database replicas. Fewer than 50% of the total were available, Jan 23 18:51:38 newponit so panic to ensure data integrity. this message shows (and the rest of the stack prove) that your panic happened in SVM. It has NOTHING to do with zfs. So either you pulled the wrong disk, or the disk you pulled also contained SVM volumes (next to ZFS). I noticed that the panic was in SVM and I'm wondering, why the machine was hanging. SVM is only running on the internal disks (c0) and I pulled a disk from the D1000: Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): Jan 23 17:24:14 newponitSCSI transport failed: reason 'incomplete': retrying command Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): Jan 23 17:24:14 newponitdisk not responding to selection Jan 23 17:24:18 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): Jan 23 17:24:18 newponitdisk not responding to selection This is clearly the disk with ZFS on it: SVM has nothing to do with this disk. A minute later, the troubles started with the internal disks: Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponitCmd (0x6a3ed10) dump for Target 0 Lun 0: Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponitcdb=[ 0x28 0x0 0x0 0x78 0x6 0x30 0x0 0x0 0x10 0x0 ] Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponitpkt_flags=0x4000 pkt_statistics=0x60 pkt_state=0x7 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponitpkt_scbp=0x0 cmd_flags=0x860 Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponitDisconnected tagged cmd(s) (1) timeout for Target 0.0 Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still available Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0: Disconnected tagged cmd(s) (1) timeout for Target 0.0 Jan 23 17:25:26 newponit glm: [ID 401478 kern.warning] WARNING: ID[SUNWpd.glm.cmd_timeout.6018] Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponitgot SCSI bus reset Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still available SVM and ZFS disks are on a seperate SCSI bus, so theoretically there should be any impact on the SVM disks when I pull out a ZFS disk. Ihsan -- [EMAIL PROTECTED] http://ihsan.dogan.ch/ http://gallery.dogan.ch/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Hello Michael, Am 24.1.2007 14:36 Uhr, Michael Schuster schrieb: -- [EMAIL PROTECTED] # zpool status pool: pool0 state: ONLINE scrub: none requested config: [...] Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600: Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to lack of DiskSuite state Jan 23 18:51:38 newponit database replicas. Fewer than 50% of the total were available, Jan 23 18:51:38 newponit so panic to ensure data integrity. this message shows (and the rest of the stack prove) that your panic happened in SVM. It has NOTHING to do with zfs. So either you pulled the wrong disk, or the disk you pulled also contained SVM volumes (next to ZFS). I noticed that the panic was in SVM and I'm wondering, why the machine was hanging. SVM is only running on the internal disks (c0) and I pulled a disk from the D1000: so the device that was affected had nothing to do with SVM at all. fine ... I have the exact same cconfig here. Internal SVM and then external ZFS on two disk arrays on two controllers. Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): Jan 23 17:24:14 newponit SCSI transport failed: reason 'incomplete': retrying command Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): Jan 23 17:24:14 newponit disk not responding to selection Jan 23 17:24:18 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): Jan 23 17:24:18 newponit disk not responding to selection This is clearly the disk with ZFS on it: SVM has nothing to do with this disk. A minute later, the troubles started with the internal disks: OKay .. so are we back to looking at ZFS or ZFS and the SVM components or some interaction between these kernel modules. At this point I have to be careful not to fall into a pit of blind ignorance as I grobe for the answer. Perhaps some data would help. Was there a core file in /var/crash/newponit ? Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponit Cmd (0x6a3ed10) dump for Target 0 Lun 0: Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponit cdb=[ 0x28 0x0 0x0 0x78 0x6 0x30 0x0 0x0 0x10 0x0 ] Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponit pkt_flags=0x4000 pkt_statistics=0x60 pkt_state=0x7 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponit pkt_scbp=0x0 cmd_flags=0x860 Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponit Disconnected tagged cmd(s) (1) timeout for Target 0.0 so a pile of scsi noise above there .. one would expect that from a suddenly missing scsi device. Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still available Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0: Disconnected tagged cmd(s) (1) timeout for Target 0.0 NCR scsi controllers .. what OS revision is this ? Solaris 10 u 3 ? Solaris Nevada snv_55b ? Jan 23 17:25:26 newponit glm: [ID 401478 kern.warning] WARNING: ID[SUNWpd.glm.cmd_timeout.6018] Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): Jan 23 17:25:26 newponit got SCSI bus reset Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still available SVM and ZFS disks are on a seperate SCSI bus, so theoretically there should be any impact on the SVM disks when I pull out a ZFS disk. I still feel that you hit a bug in ZFS somewhere. Under no circumstances should a Solaris server panic and crash simply because you pulled out a single disk that was totally mirrored. In fact .. I will reproduce those conditions here and then see what happens for me. Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Hello, Am 24.1.2007 14:40 Uhr, Dennis Clarke schrieb: We're setting up a new mailserver infrastructure and decided, to run it on zfs. On a E220R with a D1000, I've setup a storage pool with four mirrors: Good morning Ihsan ... I see that you have everything mirrored here, thats excellent. When you pulled a disk, was it a disk that was containing a metadevice or was it a disk in the zpool ? In the case of a metadevice, as you know, the system should have kept running fine. We have probably both done this over and over at various sites to demonstrate SVM to people. If you pulled out a device in the zpool, well now we are in a whole new world and I had heard that there was some *feature* in Solaris now that will protect the ZFS file system integrity by simply causing a system to panic if the last device in some redundant component was compromised. The disk was in a zpool. The SVM disks are on a separate SCSI bus, so they can't disturb each other. I think you hit a major bug in ZFS personally. For me it also looks like a bug. Ihsan -- [EMAIL PROTECTED] http://ihsan.dogan.ch/ http://gallery.dogan.ch/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Ihsan Dogan wrote: I think you hit a major bug in ZFS personally. For me it also looks like a bug. I think we don't have enough information to judge. If you have a supported version of Solaris, open a case and supply all the data (crash dump!) you have. HTH -- Michael SchusterSun Microsystems, Inc. Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Hello, Am 24.1.2007 14:49 Uhr, Jason Banham schrieb: The panic looks due to the fact that your SVM state databases aren't all there, so when we came to update one of them we found there was = 50% of the state databases and crashed. The metadbs are fine. I haven't touched them at all: [EMAIL PROTECTED] # metadb flags first blk block count a m p luo16 8192/dev/dsk/c0t0d0s7 ap luo82088192/dev/dsk/c0t0d0s7 ap luo16 8192/dev/dsk/c0t1d0s7 ap luo82088192/dev/dsk/c0t1d0s7 This doesn't look like anything to do with ZFS. I'd check the output from metadb and see if it looks like you've got a SVM database on a disk that's also in use by ZFS. The question is still, why the system is panicing? I pulled out now a different this, which is for sure on ZFS and not on SVM. The system still runs, but I can't login anymore and the console doesn't work at all anymore. Even if it has nothing to do with zfs, I don't think this is a normal behavior. Ihsan -- [EMAIL PROTECTED] http://ihsan.dogan.ch/ http://gallery.dogan.ch/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Am 24.1.2007 14:59 Uhr, Dennis Clarke schrieb: Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still available Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0: Disconnected tagged cmd(s) (1) timeout for Target 0.0 NCR scsi controllers .. what OS revision is this ? Solaris 10 u 3 ? Solaris Nevada snv_55b ? [EMAIL PROTECTED] # cat /etc/release Solaris 10 11/06 s10s_u3wos_10 SPARC Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 14 November 2006 [EMAIL PROTECTED] # uname -a SunOS newponit 5.10 Generic_118833-33 sun4u sparc SUNW,Ultra-60 SVM and ZFS disks are on a seperate SCSI bus, so theoretically there should be any impact on the SVM disks when I pull out a ZFS disk. I still feel that you hit a bug in ZFS somewhere. Under no circumstances should a Solaris server panic and crash simply because you pulled out a single disk that was totally mirrored. In fact .. I will reproduce those conditions here and then see what happens for me. And Solaris should not hang at all. Ihsan -- [EMAIL PROTECTED] http://ihsan.dogan.ch/ http://gallery.dogan.ch/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Solaris-Supported cards with battery backup
Since we're talking about various hardware configs, does anyone know which controllers with battery backup are supported on Solaris? If we build a big ZFS box I'd like to be able to turn on write caching on the drives but have them battery-backed in the event of a power loss. Are 3ware cards going to be supported any time soon? I checked and there doesn't seem to be a battery backup option for Thumper. Is that right? Does anyone know if there plans for that? -- | Jim Hranicky, Senior SysAdmin UF/CISE Department | | E314D CSE BuildingPhone (352) 392-1499 | | [EMAIL PROTECTED] http://www.cise.ufl.edu/~jfh | -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Am 24.1.2007 14:59 Uhr, Dennis Clarke schrieb: Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still available Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0: Disconnected tagged cmd(s) (1) timeout for Target 0.0 NCR scsi controllers .. what OS revision is this ? Solaris 10 u 3 ? Solaris Nevada snv_55b ? [EMAIL PROTECTED] # cat /etc/release Solaris 10 11/06 s10s_u3wos_10 SPARC Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 14 November 2006 [EMAIL PROTECTED] # uname -a SunOS newponit 5.10 Generic_118833-33 sun4u sparc SUNW,Ultra-60 oh dear. that's not Solaris Nevada at all. That is production Solaris 10. SVM and ZFS disks are on a seperate SCSI bus, so theoretically there should be any impact on the SVM disks when I pull out a ZFS disk. I still feel that you hit a bug in ZFS somewhere. Under no circumstances should a Solaris server panic and crash simply because you pulled out a single disk that was totally mirrored. In fact .. I will reproduce those conditions here and then see what happens for me. And Solaris should not hang at all. I agree. We both know this. You just recently patched a blastwave server that was running for over 700 days in production and *this* sort of behavior just does not happen in Solaris. Let me see if I can reproduce your config here : bash-3.2# metastat -p d0 -m /dev/md/rdsk/d10 /dev/md/rdsk/d20 1 d10 1 1 /dev/rdsk/c0t1d0s0 d20 1 1 /dev/rdsk/c0t0d0s0 d1 -m /dev/md/rdsk/d11 1 d11 1 1 /dev/rdsk/c0t1d0s1 d4 -m /dev/md/rdsk/d14 1 d14 1 1 /dev/rdsk/c0t1d0s7 d5 -m /dev/md/rdsk/d15 1 d15 1 1 /dev/rdsk/c0t1d0s5 d21 1 1 /dev/rdsk/c0t0d0s1 d24 1 1 /dev/rdsk/c0t0d0s7 d25 1 1 /dev/rdsk/c0t0d0s5 bash-3.2# metadb flags first blk block count a m p luo16 8192/dev/dsk/c0t0d0s4 ap luo82088192/dev/dsk/c0t0d0s4 ap luo16 8192/dev/dsk/c0t1d0s4 ap luo82088192/dev/dsk/c0t1d0s4 bash-3.2# zpool status -v zfs0 pool: zfs0 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM zfs0ONLINE 0 0 0 c1t9d0ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 c1t14d0 ONLINE 0 0 0 errors: No known data errors bash-3.2# I will add in mirrors to that zpool from another array on another controller and then yank a disk. However this machine is on snv_52 at the moment. Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic with zfs
Dennis Clarke wrote: Ihsan Dogan wrote: I think you hit a major bug in ZFS personally. For me it also looks like a bug. I think we don't have enough information to judge. If you have a supported version of Solaris, open a case and supply all the data (crash dump!) you have. I agree we need data. Everything else is just speculation and wild conjecture. I am going to create the same conditions here but with snv_55b and then yank a disk from my zpool. If I get a similar response then I will *hope* for a crash dump. You must be kidding about the open a case however. This is OpenSolaris. no, I'm not. That's why I said If you have a supported version of Solaris. Also, Ihsan seems to disagree about OpenSolaris: [EMAIL PROTECTED] # uname -a SunOS newponit 5.10 Generic_118833-33 sun4u sparc SUNW,Ultra-60 Michael -- Michael SchusterSun Microsystems, Inc. Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Thumper Origins Q
too much of our future roadmap, suffice it to say that one should expect much, much more from Sun in this vein: innovative software and innovative hardware working together to deliver world-beating systems with undeniable economics. Yes please. Now give me a fairly cheap (but still quality) FC-attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] unsubscribe
Hi Guys, I completely forgot to unsubscribe to the zfs list before changing email addresses, and no longer have access to the old one. Is there someone I can contact about manually removing my old address, or updating it with my new one? Thanks! --Tim This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On Jan 24, 2007, at 09:25, Peter Eriksson wrote: too much of our future roadmap, suffice it to say that one should expect much, much more from Sun in this vein: innovative software and innovative hardware working together to deliver world-beating systems with undeniable economics. Yes please. Now give me a fairly cheap (but still quality) FC- attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-) Could you outline why FC attached instead of network attached (iSCSI say) makes more sense to you? It might help to illustrate the demand for an FC target I'm hearing instead of just a network target .. .je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Thumper Origins Q
I think this will be a hard sell internally given that it would eat up their own storagetek line. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS direct IO
On Jan 24, 2007, at 06:54, Roch - PAE wrote: [EMAIL PROTECTED] writes: Note also that for most applications, the size of their IO operations would often not match the current page size of the buffer, causing additional performance and scalability issues. Thanks for mentioning this, I forgot about it. Since ZFS's default block size is configured to be larger than a page, the application would have to issue page-aligned block-sized I/Os. Anyone adjusting the block size would presumably be responsible for ensuring that the new size is a multiple of the page size. (If they would want Direct I/O to work...) I believe UFS also has a similar requirement, but I've been wrong before. I believe the UFS requirement is that the I/O be sector aligned for DIO to be attempted. And Anton did mention that one of the benefit of DIO is the ability to direct-read a subpage block. Without UFS/DIO the OS is required to read and cache the full page and the extra amount of I/O may lead to data channel saturation (I don't see latency as an issue in here, right ?). In QFS there are mount options to do automatic type switching depending on whether or not the IO is sector aligned or not. You essentially set a trigger to switch to DIO if you receive a tunable number of well aligned IO requests. This helps tremendously in certain streaming workloads (particularly write) to reduce overhead. This is where I said that such a feature would translate for ZFS into the ability to read parts of a filesystem block which would only make sense if checksums are disabled. would it be possible to do checksums a posteri? .. i suspect that the checksum portion of the transaction may not be atomic though and this leads us back towards the older notion of a DIF. And for RAID-Z that could mean avoiding I/Os to each disks but one in a group, so that's a nice benefit. So for the performance minded customer that can't afford mirroring, is not much a fan of data integrity, that needs to do subblock reads to an uncacheable workload, then I can see a feature popping up. And this feature is independant on whether or not the data is DMA'ed straight into the user buffer. certain streaming write workloads that are time dependent can fall into this category .. if i'm doing a DMA read directly from a device's buffer that i'd like to stream - i probably want to avoid some of the caching layers of indirection that will probably impose more overhead. The idea behind allowing an application to advise the filesystem of how it plans on doing it's IO (or the state of it's own cache or buffers or stream requirements) is to prevent the one cache fits all sort of approach that we currently seem to have in the ARC. The other feature, is to avoid a bcopy by DMAing full filesystem block reads straight into user buffer (and verify checksum after). The I/O is high latency, bcopy adds a small amount. The kernel memory can be freed/reuse straight after the user read completes. This is where I ask, how much CPU is lost to the bcopy in workloads that benefit from DIO ? But isn't the cost more than just the bcopy? Isn't there additional overhead in the TLB/PTE from the page invalidation that needs to occur when you do actually go to write the page out or flush the page? At this point, there are lots of projects that will lead to performance improvements. The DIO benefits seems like small change in the context of ZFS. The quickest return on investement I see for the directio hint would be to tell ZFS to not grow the ARC when servicing such requests. How about the notion of multiple ARCs that could be referenced or fine tuned for various types of IO workload profiles to provide a more granular approach? Wouldn't this also keep the page tables smaller and hopefully more contiguous for atomic operations? Not sure what this would break .. .je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
Peter Eriksson wrote: too much of our future roadmap, suffice it to say that one should expect much, much more from Sun in this vein: innovative software and innovative hardware working together to deliver world-beating systems with undeniable economics. Yes please. Now give me a fairly cheap (but still quality) FC-attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-) ... with write cache and dual redundant controllers? I think we call that the Sun StorageTek 3511. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
well, Thumper is actually a reference to Bambi You'd have to ask Fowler, but certainly when he coined it, Bambi was the last thing on anyone's mind. I believe Fowler's intention was one that thumps (or, in the unique parlance of a certain Commander-in-Chief, one that gives a thumpin'). - Bryan -- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Can you turn on zfs compression when the fs is already populated?
I have an 800GB raidz2 zfs filesystem. It already has approx 142Gb of data. Can I simply turn on compression at this point, or do you need to start with compression at the creation time? If I turn on compression now, what happens to the existing data? Thanks, Neal ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's about the same price for the low-end NetApp FAS250 unit. -Moazam On Jan 24, 2007, at 9:40 AM, Richard Elling wrote: Peter Eriksson wrote: too much of our future roadmap, suffice it to say that one should expect much, much more from Sun in this vein: innovative software and innovative hardware working together to deliver world-beating systems with undeniable economics. Yes please. Now give me a fairly cheap (but still quality) FC- attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-) ... with write cache and dual redundant controllers? I think we call that the Sun StorageTek 3511. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can you turn on zfs compression when the fs is already populated?
I have an 800GB raidz2 zfs filesystem. It already has approx 142Gb of data. Can I simply turn on compression at this point, or do you need to start with compression at the creation time? If I turn on compression now, what happens to the existing data? Yes. Nothing. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
Bryan Cantrill stated: well, Thumper is actually a reference to Bambi I keep thinking of the classic AC/DC song when Fowler and thumpers are mentioned.. s/thunder/thumper/ You'd have to ask Fowler, but certainly when he coined it, Bambi was the last thing on anyone's mind. I believe Fowler's intention was one that thumps (or, in the unique parlance of a certain Commander-in-Chief, one that gives a thumpin'). - Bryan -- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sean. . ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can you turn on zfs compression when the fs is already populated?
Neal Pollack wrote: I have an 800GB raidz2 zfs filesystem. It already has approx 142Gb of data. Can I simply turn on compression at this point, or do you need to start with compression at the creation time? As I understand it, you can turn compression on and off at will. Data will be written to the disk according to the compression mode, and either compressed or uncompressed blocks can be read regardless of the setting. If I turn on compression now, what happens to the existing data? Existing (uncompressed) data will remain uncompressed until it is re-written, at which point it may be compressed. Dana ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On Wed, Jan 24, 2007 at 09:46:11AM -0800, Moazam Raja wrote: Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's about the same price for the low-end NetApp FAS250 unit. Note that the 3511 is being replaced with the 6140: http://www.sun.com/storagetek/disk_systems/midrange/6140/ Also, don't read too much into the prices you see on the website -- that's the list price, and doesn't reflect any discounting. If you're interested in what it _actually_ costs, you should talk to a Sun rep or one of our channel partners to get a quote. (And lest anyone attack the messenger: I'm not defending this system of getting an accurate price, I'm just describing it.) - Bryan -- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On Wed, 24 Jan 2007, Jonathan Edwards wrote: Yes please. Now give me a fairly cheap (but still quality) FC-attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-) Could you outline why FC attached instead of network attached (iSCSI say) makes more sense to you? It might help to illustrate the demand for an FC target I'm hearing instead of just a network target .. Dunno about FC or iSCSI, but what I'd really like to see is a 1U direct attach 8-drive SAS JBOD, as described (back in May 2006!) here: http://richteer.blogspot.com/2006/05/sun-storage-product-i-would-like-to.html Modulo the UltraSCSI 320 stuff perhaps. Given that other vendors have released something similar, and how strong Sun's entry-level server offerings are, I can't believe that Sun hasn't annouced something like this, to bring their entry-level storage offerings up to the bar set by their servers... -- Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-group.com/rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
On Jan 24, 2007, at 12:41, Bryan Cantrill wrote: well, Thumper is actually a reference to Bambi You'd have to ask Fowler, but certainly when he coined it, Bambi was the last thing on anyone's mind. I believe Fowler's intention was one that thumps (or, in the unique parlance of a certain Commander-in-Chief, one that gives a thumpin'). You can take your pick of things that thump here: http://en.wikipedia.org/wiki/Thumper given the other name is the X4500 .. it does seem like it should be a weapon --- .je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On 1/24/07, Jonathan Edwards [EMAIL PROTECTED] wrote: On Jan 24, 2007, at 09:25, Peter Eriksson wrote: too much of our future roadmap, suffice it to say that one should expect much, much more from Sun in this vein: innovative software and innovative hardware working together to deliver world-beating systems with undeniable economics. Yes please. Now give me a fairly cheap (but still quality) FC- attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-) Could you outline why FC attached instead of network attached (iSCSI say) makes more sense to you? It might help to illustrate the demand for an FC target I'm hearing instead of just a network target .. I'm not generally for FC-attached storage, but we've documented here many times how the round trip latency with iSCSI hasn't been the perfect match with ZFS and NFS (think NAS). You need either IB or FC right now to make that workable. Some day though.. either with nvram-backed NFS or cheap 10Gig-E... .je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
You can take your pick of things that thump here: http://en.wikipedia.org/wiki/Thumper I think it's safe to say that Fowler was thinking more along the lines of whomever dubbed the M79 grenade launcher -- which you can safely bet was not named after a fictional bunny... - Bryan -- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
On Wed, 24 Jan 2007, Sean McGrath - Sun Microsystems Ireland wrote: Bryan Cantrill stated: well, Thumper is actually a reference to Bambi I keep thinking of the classic AC/DC song when Fowler and thumpers are mentioned.. s/thunder/thumper/ Yeah, AC/DC songs seem to be most apropos for Sun at the moment: * Thumperstruck (the subject of this thread) * For those about to rock (the successor to the US-IV) * Back in back (Sun's return to profitability as announced yesterday) Although Queen is almost as good: * We will Rock you * We are the champions And what do M$ users have? Courtesy of the Rolling Stones: * (I can't get) no satisfaction * 19th Nervous breakdown :-) -- Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-group.com/rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On January 24, 2007 9:40:41 AM -0800 Richard Elling [EMAIL PROTECTED] wrote: Peter Eriksson wrote: Yes please. Now give me a fairly cheap (but still quality) FC-attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-) ... with write cache and dual redundant controllers? I think we call that the Sun StorageTek 3511. Ah but the 3511 JBOD is not supported for direct attach to a host, nor is it supported for attachment to a SAN. You have to have a 3510 or 3511 with RAID controller to use the 3511 JBOD. The RAID controller is pretty pricey on these guys. $5k each IIRC. On January 24, 2007 10:04:04 AM -0800 Bryan Cantrill [EMAIL PROTECTED] wrote: On Wed, Jan 24, 2007 at 09:46:11AM -0800, Moazam Raja wrote: Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's about the same price for the low-end NetApp FAS250 unit. Note that the 3511 is being replaced with the 6140: Which is MUCH nicer but also much pricier. Also, no non-RAID option. You can get a 4Gb FC-SATA RAID with 12*750gb drives for about $10k from third parties. I doubt we'll ever see that from Sun if for no other reason just due to the drive markups. (Which might be justified based on drive qualification; I'm not making any comment as to whether the markup is warranted or not, just that it exists and is obscene.) But you still can't beat thumper overall. I believe S10U3 has iSCSI target support? If so, there you go. Not on the low end in absolute $$$ but certainly in $/GB per bits/sec. Probably better on power too compared to equivalent solutions. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
On Wed, 24 Jan 2007, Bryan Cantrill wrote: I think it's safe to say that Fowler was thinking more along the lines Presumably, that's John Fowler? -- Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-group.com/rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On January 24, 2007 10:02:52 AM -0800 Rich Teer [EMAIL PROTECTED] wrote: Dunno about FC or iSCSI, but what I'd really like to see is a 1U direct attach 8-drive SAS JBOD, as described (back in May 2006!) here: http://richteer.blogspot.com/2006/05/sun-storage-product-i-would-like-to.html The problem with that is the 2.5 drives are too expensive and too small. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On 24 Jan 2007, at 13:04, Bryan Cantrill wrote: On Wed, Jan 24, 2007 at 09:46:11AM -0800, Moazam Raja wrote: Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's about the same price for the low-end NetApp FAS250 unit. Note that the 3511 is being replaced with the 6140: http://www.sun.com/storagetek/disk_systems/midrange/6140/ Also, don't read too much into the prices you see on the website -- that's the list price, and doesn't reflect any discounting. If you're interested in what it _actually_ costs, you should talk to a Sun rep or one of our channel partners to get a quote. (And lest anyone attack the messenger: I'm not defending this system of getting an accurate price, I'm just describing it.) If your company can qualify as a start-up (4 year old or less with less than 150 employees) you may want to look at the Sun Startup essentials program. It provides Sun hardware at big discounts for startups. http://www.sun.com/emrkt/startupessentials/ For an idea on the levels of discounts see http://kalsey.com/2006/11/sun_startup_essentials_pricing/ -Angelo - Bryan --- --- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
Frank Cusack wrote: On January 24, 2007 9:40:41 AM -0800 Richard Elling [EMAIL PROTECTED] wrote: Peter Eriksson wrote: Yes please. Now give me a fairly cheap (but still quality) FC-attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-) ... with write cache and dual redundant controllers? I think we call that the Sun StorageTek 3511. Ah but the 3511 JBOD is not supported for direct attach to a host, nor is it supported for attachment to a SAN. You have to have a 3510 or 3511 with RAID controller to use the 3511 JBOD. The RAID controller is pretty pricey on these guys. $5k each IIRC. I started looking into the 3511 for a ZFS system and just about immediately stopped considering it for this reason. If it is not supported in JBOD, then I might as well go get a third party JBOD at the same level of support. You can get a 4Gb FC-SATA RAID with 12*750gb drives for about $10k from third parties. I doubt we'll ever see that from Sun if for no other reason just due to the drive markups. (Which might be justified based on drive qualification; I'm not making any comment as to whether the markup is warranted or not, just that it exists and is obscene.) Yep. I went with a third party FC/SATA unit which has been flawless as a direct attach for my ZFS JBOD system. Paid about $0.70/GB. And I still have enough money left over this year to upgrade my network core. If I would have gone with Sun, I wouldn't be able to push as many bits across my network. I just don't know how people can afford Sun storage, or even if they can, what drives them to pay such premiums. Sun is missing out on lots of lower end storage, but perhaps that is by design. I am a small shop by many standards, but I would have spent tens of thousands over the last few years with Sun if they had reasonably priced storage. shrug I just need a place to put my bits. Doesn't need to be the fastest, bleeding edge stuff. Just a bucket that performs reasonably, and preferably one that I can use with ZFS. -Shannon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS direct IO
And this feature is independant on whether or not the data is DMA'ed straight into the user buffer. I suppose so, however, it seems like it would make more sense to configure a dataset property that specifically describes the caching policy that is desired. When directio implies different semantics for different filesystems, customers are going to get confused. The other feature, is to avoid a bcopy by DMAing full filesystem block reads straight into user buffer (and verify checksum after). The I/O is high latency, bcopy adds a small amount. The kernel memory can be freed/reuse straight after the user read completes. This is where I ask, how much CPU is lost to the bcopy in workloads that benefit from DIO ? Right, except that if we try to DMA into user buffers with ZFS there's a bunch of other things we need the VM to do on our behalf to protect the integrity of the kernel data that's living in user pages. Assume you have a high-latency I/O and you've locked some user pages for this I/O. In a pathological case, when another thread tries to access the locked pages and then also blocks, it does so for the duration of the first thread's I/O. At that point, it seems like it might be easier to accept the cost of the bcopy instead of blocking another thread. I'm not even sure how to assess the impact of VM operations required to change the permissions on the pages before we start the I/O. The quickest return on investement I see for the directio hint would be to tell ZFS to not grow the ARC when servicing such requests. Perhaps if we had an option that specifies not to cache data from a particular dataset, that would suffice. I think you've filed a CR along those lines already (6429855)? -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On Wed, 24 Jan 2007, Shannon Roddy wrote: Sun is missing out on lots of lower end storage, but perhaps that is by design. I am a small shop by many standards, but I would have spent tens of thousands over the last few years with Sun if they had reasonably priced storage. shrug I just need a place to put my bits. Doesn't need to be the fastest, bleeding edge stuff. Just a bucket that performs reasonably, and preferably one that I can use with ZFS. +1 -- Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-group.com/rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Can you turn on zfs compression when the fs is already populated?
I've used the COMPRESS feature for quite a while and you can flip back and forth without any problem. When you turn the compress ON nothing happens to the existing data. However when you start updating your files all new blocks will be compressed; so it is possible to have your file be composed of both compressed and uncompressed blocks! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Converting home directory from ufs to zfs
No such facility exists to automagically convert an existing UFS filesystem to ZFS. You've to create a new ZFS pool/filesystem and then move your data. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool split
On 23/01/07, Darren J Moffat [EMAIL PROTECTED] wrote: Can you pick another name for this please because that name has already been suggested for zfs(1) where the argument is a directory in an existing ZFS file system and the result is that the directory becomes a new ZFS file system while retaining its contents. Sorry to jump in on the thread, but - that's an excellent feature addition, look forward to it. Will it be accompanied by a 'zfs join'? -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
Bryan Cantrill wrote: well, Thumper is actually a reference to Bambi You'd have to ask Fowler, but certainly when he coined it, Bambi was the last thing on anyone's mind. I believe Fowler's intention was one that thumps (or, in the unique parlance of a certain Commander-in-Chief, one that gives a thumpin'). me, I always thought of calling sandworms. sandworms use up a lot of space, you see... And bring in a lot of cash (IIRC, the worms caused the spice and the spice was mined) It was my association too. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
[EMAIL PROTECTED] wrote: Bryan Cantrill wrote: well, Thumper is actually a reference to Bambi You'd have to ask Fowler, but certainly when he coined it, Bambi was the last thing on anyone's mind. I believe Fowler's intention was one that thumps (or, in the unique parlance of a certain Commander-in-Chief, one that gives a thumpin'). me, I always thought of calling sandworms. sandworms use up a lot of space, you see... And bring in a lot of cash (IIRC, the worms caused the spice and the spice was mined) It was my association too. ...and if you imagine 48 head positioner arms moving at once one can imagine the vibration would travel through sand, is all. Just means it's a good name, I suppose! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris-Supported cards with battery backup
Hello James, Wednesday, January 24, 2007, 3:20:14 PM, you wrote: JFH Since we're talking about various hardware configs, does anyone know JFH which controllers with battery backup are supported on Solaris? If JFH we build a big ZFS box I'd like to be able to turn on write caching JFH on the drives but have them battery-backed in the event of a power JFH loss. Are 3ware cards going to be supported any time soon? JFH I checked and there doesn't seem to be a battery backup option JFH for Thumper. Is that right? Does anyone know if there plans for JFH that? ZFS makes sure itself that transaction is on disk by issuing write cache flush command to disks. So you don't have to worry about it. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris-Supported cards with battery backup
On Jan 24, 2007, at 1:57 PM, Robert Milkowski wrote: Hello James, Wednesday, January 24, 2007, 3:20:14 PM, you wrote: JFH Since we're talking about various hardware configs, does anyone know JFH which controllers with battery backup are supported on Solaris? If JFH we build a big ZFS box I'd like to be able to turn on write caching JFH on the drives but have them battery-backed in the event of a power JFH loss. Are 3ware cards going to be supported any time soon? JFH I checked and there doesn't seem to be a battery backup option JFH for Thumper. Is that right? Does anyone know if there plans for JFH that? ZFS makes sure itself that transaction is on disk by issuing write cache flush command to disks. So you don't have to worry about it. Areca SATA cards are supported on Solaris x86 by Areca (drivers etc from them, not from Sun) and they support battery backup. It is what I am using Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris-Supported cards with battery backup
Robert Milkowski wrote: Hello James, Wednesday, January 24, 2007, 3:20:14 PM, you wrote: JFH Since we're talking about various hardware configs, does anyone know JFH which controllers with battery backup are supported on Solaris? If JFH we build a big ZFS box I'd like to be able to turn on write caching JFH on the drives but have them battery-backed in the event of a power JFH loss. Are 3ware cards going to be supported any time soon? JFH I checked and there doesn't seem to be a battery backup option JFH for Thumper. Is that right? Does anyone know if there plans for JFH that? ZFS makes sure itself that transaction is on disk by issuing write cache flush command to disks. So you don't have to worry about it. Ok, does that negate the performance gains of having the write cache on? I guess what I'm really asking is with the problems I and others have noted with NFS/ZFS, what's currently the best way to get good NFS performance without sacrificing reliability (i.e., disabling the ZIL, etc). If a battery-backed cache isn't necessary, all the better. Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Synchronous Mount?
Specifically, I was trying to compare ZFS snapshots with LVM snapshots on Linux. One of the tests does writes to an ext3FS (that's on top of an LVM snapshot) mounted synchronously, in order to measure the real Copy-on-write overhead. So, I was wondering if I could do the same with ZFS. Seems not. Given that ZFS does COW for *all* writes, what does this test actually intend to show when running on ZFS? Am I missing something, or should not writes to a clone be as fast, or even faster, than a write to a non-clone? Given that COW is always performed, but in the case of the clone the old data is not removed. -- / Peter Schuller, InfiDyne Technologies HB PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Synchronous Mount?
Specifically, I was trying to compare ZFS snapshots with LVM snapshots on Linux. One of the tests does writes to an ext3FS (that's on top of an LVM snapshot) mounted synchronously, in order to measure the real Copy-on-write overhead. So, I was wondering if I could do the same with ZFS. Seems not. Given that ZFS does COW for *all* writes, what does this test actually intend to show when running on ZFS? Am I missing something, or should not writes to a clone be as fast, or even faster, than a write to a non-clone? Given that COW is always performed, but in the case of the clone the old data is not removed. well, yes - for ZFS. But not the case with LVM snapshots. Doing the same (sync mount) on ZFS was just for comparing them on similar grounds. Anyways, I figured ZFS performs way better. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Thumper Origins Q
#1 is speed. You can aggregate 4x1Gbit ethernet and still not touch 4Gb/sec FC. #2 drop in compatibility. I'm sure people would love to drop this into an existing SAN #2 is the key for me. And I also have a #3: FC has been around a long time now. The HBAs and Switches are (more or less :-) debugged and we know how things work... iSCSI - well, perhaps. But to me that feels like it gets too far away from the hardware. I'd like to keep the distance between the disks and ZFS as short as possible. Ie: ZFS - HBA - FC Switch - JBOD - Simple FC-SATA-converter - SATA disk This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Solaris-Supported cards with battery backup
Hello James, Wednesday, January 24, 2007, 10:31:46 PM, you wrote: JFH Robert Milkowski wrote: Hello James, Wednesday, January 24, 2007, 3:20:14 PM, you wrote: JFH Since we're talking about various hardware configs, does anyone know JFH which controllers with battery backup are supported on Solaris? If JFH we build a big ZFS box I'd like to be able to turn on write caching JFH on the drives but have them battery-backed in the event of a power JFH loss. Are 3ware cards going to be supported any time soon? JFH I checked and there doesn't seem to be a battery backup option JFH for Thumper. Is that right? Does anyone know if there plans for JFH that? ZFS makes sure itself that transaction is on disk by issuing write cache flush command to disks. So you don't have to worry about it. JFH Ok, does that negate the performance gains of having the write cache JFH on? JFH I guess what I'm really asking is with the problems I and others have JFH noted with NFS/ZFS, what's currently the best way to get good NFS JFH performance without sacrificing reliability (i.e., disabling the ZIL, JFH etc). JFH If a battery-backed cache isn't necessary, all the better. I thought you were worried about write cache in disks - if you dedicate whole disks for zfs on x4500 write cache on disks will be enabled by default. But if you are talking about another level of cache then you're right - currently you can't do it on x4500. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] On-failure policies for pools
comment below... Peter Schuller wrote: In many situations it may not feel worth it to move to a raidz2 just to avoid this particular case. I can't think of any, but then again, I get paid to worry about failures :-) Given that one of the tauted features of ZFS is data integrity, including in the case of cheap drives, that implies it is of interest to get maximum integrity with any given amount of resources. In your typical home use situation for example, buying 4 drives of decent size is pretty expensive considering that it *is* home use. Getting 4 drives for the diskspace of 3 is a lot more attractive than 5 drives for the diskspace of 3. But given that you do get 4 drives and put them in a raidz, you want as much safety as possible, and often you don't care that much about availability. That said, the argument scales. If you're not in a situation like the above, you may easily warrant wasting an extra drive on raidz2. But raidz2 without this feature is still less safe than raidz2 with the feature. So moving back to the idea of getting as much redundancy as possible given a certain set of hardware resources, you're still not optimal given your hardware. Please correct me if I misunderstand your reasoning, are you saying that a broken disk should not be replaced? Sorry, no. However, I realize my desire actually requires an additional feature. The situation I envision situation is this: * One disk goes down in a raidz, because the controller suddenly broke (platters/heads are fine). * You replace the disk and start a re-silvering. * You trigger a bad block. At this point, you are now pretty screwed, unless: * The pool did not change after the original drive failed, AND a broken drive assisted resilvering is supported. You go to whatever effort required to fix the disk (say, buy another one of the same model and replace the controller, or hire some company that does this stuff), re-insert it into the machine. * At this point you have a drive you can read data off of, but that you certainly don't trust in general. So you want to start replacing the drive with the new drive; if ZFS were then able to resilver to the new drive by using both parity data on other healthy drives in the pool, and the disk being replaced, you're a happy. It is my understanding that zpool replace already does this. Just don't remove the failing disk... Or let's do a more likely senario. A disk starts dying because of bad sectors (the disk has run out of remapping possibilities). You cannot fix this anymore by re-writing the bad sectors; trying to re-write the sector ends up failing with an I/O error and ZFS kicks the disk out of the pool. Standard procedure at this point is to replace the drive and resilver. But once again - you might end up with a bad sector on another drive. Without utilizing the existing broken drive, you're screwed. If however you were able to take advantage of sectors that *ARE* readable off of the drive, and the drive has *NOT* gone out of date since it was kicked out due to additional transaction commits, you are once again happy. (Once again assuming you don't happen to have bad luck and the set of bad sectors on the two drives overlap.) ... I think I was off base previously. It seems to me that you are really after the policy for failing/failed disks. Currently, the only way a drive gets kicked out is if ZFS cannot open it. Obviously, if ZFS cannot open the drive, then you won't be able to read anything from it. Looking forward, I think that there are several policies which may be desired... If so, then that is contrary to the accepted methods used in most mission critical systems. There may be other methods which meet your requirements and are accepted. For example, one procedure we see for those sites who are very interested in data retention is to power off a system when it is degraded to a point (as specified) where data retention is put at unacceptable risk. This is kind of what I am after, except that I want to guarantee that not a single transaction gets committed once a pool is degraded. Even if an admin goes and turns the machine off, the disk will be out of date. ... such as a policy that says if a disk is going bad, go read-only. I'm quite sure that most applications won't respond well to such a policy, though. The theory is that a powered down system will stop wearing out. When the system is serviced, then it can be brought back online. Obviously, this is not the case where data availability is a primary requirement -- data retention has higher priority. On the other hand, hardware has a nasty tendancy to break in relation to power cycles... We can already set a pool (actually the file systems in a pool) to be read only. Automatically and *immediately* on a drive failure? You can listen to sysevents and implement policies. There may be something else lurking here that we might be able to take advantage
[zfs-discuss] Why replacing a drive generates writes to other disks?
Hello zfs-discuss, Subject says it all. I first checked - no IO activity at all to the pool named thumper-2. So I started replacing one drive with 'zpool replace thumper-2 c7t7d0 c4t1d0'. Now the question is why am I seeing writes to other disks than c7t7d0? Also why in case of replacing a disk we do not just copy disk-to-disk? It would be MUCH faster here. Probably 'coz we're traversing meta-data? But perhaps it could be done in a clever way so we endup just copying from one disk to another. Checking parity or checksum here it's not necessary - scrub is for it. What we want in most cases is to replace drive as fast as possible. On another thumper I have a failing drive (port resets, etc.) so I issued over a week ago drive replacement. Well it still hasn't completed even 4% in a week! The pool config is the same. It's just wy to slow and in a long term risky. bash-3.00# zpool status pool: thumper-2 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0,01% done, 350h29m to go config: NAME STATE READ WRITE CKSUM thumper-2 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t0d0ONLINE 0 0 0 c1t0d0ONLINE 0 0 0 c4t0d0ONLINE 0 0 0 c6t0d0ONLINE 0 0 0 c7t0d0ONLINE 0 0 0 c0t1d0ONLINE 0 0 0 c1t1d0ONLINE 0 0 0 c5t1d0ONLINE 0 0 0 c6t1d0ONLINE 0 0 0 c7t1d0ONLINE 0 0 0 c0t2d0ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t2d0ONLINE 0 0 0 c5t2d0ONLINE 0 0 0 c6t2d0ONLINE 0 0 0 c7t2d0ONLINE 0 0 0 c0t4d0ONLINE 0 0 0 c1t4d0ONLINE 0 0 0 c4t4d0ONLINE 0 0 0 c6t4d0ONLINE 0 0 0 c7t4d0ONLINE 0 0 0 c0t3d0ONLINE 0 0 0 c1t3d0ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c4t3d0ONLINE 0 0 0 c5t3d0ONLINE 0 0 0 c6t3d0ONLINE 0 0 0 c7t3d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 c1t5d0ONLINE 0 0 0 c4t5d0ONLINE 0 0 0 c5t5d0ONLINE 0 0 0 c6t5d0ONLINE 0 0 0 c7t5d0ONLINE 0 0 0 c0t6d0ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t6d0ONLINE 0 0 0 c4t6d0ONLINE 0 0 0 c5t6d0ONLINE 0 0 0 c6t6d0ONLINE 0 0 0 c7t6d0ONLINE 0 0 0 c0t7d0ONLINE 0 0 0 c1t7d0ONLINE 0 0 0 c4t7d0ONLINE 0 0 0 c5t7d0ONLINE 0 0 0 c6t7d0ONLINE 0 0 0 spare ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 spares c4t1d0 INUSE currently in use c4t2d0 AVAIL errors: No known data errors pool: zones state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zones ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t0d0s4 ONLINE 0 0 0 c5t4d0s4 ONLINE 0 0 0 errors: No known data errors bash-3.00# # iostat -xnz 1 [...] extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 114.00.0 7232.30.0 5.9 0.8 51.96.7 74 76 c0t0d0 132.00.0 8320.60.0 9.0 1.0 68.47.5 95 98 c6t0d0 123.00.0 7807.70.0 7.3 0.8 59.36.3 76 77 c7t0d0 115.00.0 7296.30.0 7.9 0.8 68.77.1 80 81 c4t0d0 100.00.0 6336.40.0 3.6 0.6 36.36.0 56 60 c6t1d0 0.0 297.00.0 151.0 0.0 0.00.00.2 0 5 c4t1d0 106.00.0 6720.30.0 5.3 0.6 50.06.1 63 65 c7t1d0 122.00.0 7743.70.0 6.9 0.7 56.86.0 72 73 c0t1d0 120.00.0 7679.20.0 5.6 0.7 46.95.7 66 68 c1t1d0 4.00.0 129.50.0 0.0
Re: [zfs-discuss] Re: Thumper Origins Q
Ben Gollmer wrote: On Jan 24, 2007, at 12:37 PM, Shannon Roddy wrote: I went with a third party FC/SATA unit which has been flawless as a direct attach for my ZFS JBOD system. Paid about $0.70/GB. What did you use, if you don't mind my asking? Arena Janus 6641. Turns out I underestimated what I paid per GB. I went back and dug up the invoice and I paid just under $1/GB. My memory was a little off on the 750 GB drive prices. I used an LSI Logic FC card that was listed on the Solaris Ready page, and I am using the LSI Logic driver. http://www.sun.com/io_technologies/vendor/lsi_logic_corporation.html Works fine for our purposes, but again, we don't need screaming bleeding edge performance either. -Shannon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
On Jan 24, 2007, at 04:06, Bryan Cantrill wrote: On Wed, Jan 24, 2007 at 12:15:21AM -0700, Jason J. W. Williams wrote: Wow. That's an incredibly cool story. Thank you for sharing it! Does the Thumper today pretty much resemble what you saw then? Yes, amazingly so: 4-way, 48 spindles, 4u. The real beauty of the match between ZFS and Thumper was (and is) that ZFS unlocks new economics in storage -- smart software achieving high performance and ultra-high If Thumper and ZFS were born independently, how were all those disks going to be used without ZFS? It seems logical that the two be mated, but AFAIK there is no hardware RAID available in Thumpers. Was normal software RAID the plan? Treating each disk as a separate mount point? Just curious. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Thumper Origins Q
Hello David, Thursday, January 25, 2007, 1:47:57 AM, you wrote: DM On Jan 24, 2007, at 04:06, Bryan Cantrill wrote: On Wed, Jan 24, 2007 at 12:15:21AM -0700, Jason J. W. Williams wrote: Wow. That's an incredibly cool story. Thank you for sharing it! Does the Thumper today pretty much resemble what you saw then? Yes, amazingly so: 4-way, 48 spindles, 4u. The real beauty of the match between ZFS and Thumper was (and is) that ZFS unlocks new economics in storage -- smart software achieving high performance and ultra-high DM If Thumper and ZFS were born independently, how were all those disks DM going to be used without ZFS? It seems logical that the two be mated, DM but AFAIK there is no hardware RAID available in Thumpers. DM Was normal software RAID the plan? Treating each disk as a separate DM mount point? I guess Linux was considered probably with LVM or something else. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why replacing a drive generates writes to other disks?
On Thu, Jan 25, 2007 at 12:39:25AM +0100, Robert Milkowski wrote: Hello zfs-discuss, On another thumper I have a failing drive (port resets, etc.) so I issued over a week ago drive replacement. Well it still hasn't completed even 4% in a week! The pool config is the same. It's just wy to slow and in a long term risky. The last time I saw something like this was on a D1000 that had serious parity issues. Overall it spent so much time retrying and backing down the transfer rate that the data path to the disks was so slow as to be unusable. Got new cables and the problem went away. Don't know if that applies to you or not. -brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
Hi Wee, Having snapshots in the filesystem that work so well is really nice. How are y'all quiescing the DB? Best Regards, J On 1/24/07, Wee Yeh Tan [EMAIL PROTECTED] wrote: On 1/25/07, Bryan Cantrill [EMAIL PROTECTED] wrote: ... after all, what was ZFS going to do with that expensive but useless hardware RAID controller? ... I almost rolled over reading this. This is exactly what I went through when we moved our database server out from Vx** to ZFS. We had a 3510 and were thinking how best to configure the RAID. In the end, we ripped out the controller board and used the 3510 as a JBOD directly attached to the server. My DBA was so happy with this setup (especially with the snapshot capability) he is asking for another such setup. -- Just me, Wire ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zpool split
...such that a snapshot (cloned if need be) won't do what you want? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool split
On 25/01/07, Adam Leventhal [EMAIL PROTECTED] wrote: On Wed, Jan 24, 2007 at 08:52:47PM +, Dick Davies wrote: that's an excellent feature addition, look forward to it. Will it be accompanied by a 'zfs join'? Out of curiosity, what will you (or anyone else) use this for? If the idea is to copy datasets to a new pool, why not use zfs send/receive? To clarify, I'm talking about 'zfs split' as in breaking /tank/export/home into /tank/export/home/user1 , /tank/export/home/user2, etc. The 'zfs join' is just an undo to help me out when I've been overzealous, every directory in my system is a filesystem, and I have more automated snapshots than I can stand... -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X2100 not hotswap, was Re: External drive enclosures + Sun Server for massstorage
On January 23, 2007 8:11:24 PM -0200 Toby Thain [EMAIL PROTECTED] wrote: Still, would be nice for those of us who bought them. And judging by other posts on this thread it seems just about everyone assumes hotswap just works. hot *plug* :-) -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss