Re: [zfs-discuss] LAST CALL: zfs-discuss is moving Sunday, March 24, 2013
However, the zfs-discuss list seems to be archived at gmane. On 2013-03-22 22:57, Cindy Swearingen wrote: I hope to see everyone on the other side... *** The ZFS discussion list is moving to java.net. This opensolaris/zfs discussion will not be available after March 24. There is no way to migrate the existing list to the new list. The solaris-zfs project is here: http://java.net/projects/solaris-zfs See the steps below to join the ZFS project or just the discussion list, but you must create an account on java.net to join the list. Thanks, Cindy 1. Create an account on java.net. https://java.net/people/new 2. When logged in to your java.net account, join the solaris-zfs project as an Observer by clicking the Join This Project link on the left side of this page: http://java.net/projects/solaris-zfs 3. Subscribe to the zfs discussion mailing list here: http://java.net/projects/solaris-zfs/lists ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Dirves going offline in Zpool
Hi, I have Dell md1200 connected to two heads ( Dell R710 ). The heads have Perc H800 card and drives are configured in Raid0 ( Virtual Disk) in the RAID controller. One of the drives had crashed and is replaced by a spare. Resilvering was triggered but fails to complete due to drives going offline. I have to reboot the head ( R710) and drives comes online. This happened repeatedly when resilver was 4% done, and again was rebooted , again hung at 27% done, etc. The issues happens with both Solaris11.1/ Omnios. Its a 100Tb pool with 69Tb used. I have critical data and cant afford loss of data. Can I recover the data anyway ( atleast partially ) ? I had verified there is no hardware issue with H800 and also upgraded the firmware for H800. The issue happens with both the heads. Current OS: Solaris 11.1 Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@12,0 (sd26): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@c,0 (sd20): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@18,0 (sd32): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@1c,0 (sd36): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@1b,0 (sd35): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@1e,0 (sd38): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@19,0 (sd33): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@1d,0 (sd37): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@27,0 (sd47): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0 ,0/pci8086,340e@7/pci1028,1f15@0/sd@26,0 (sd46): Mar 22 21:47:55 solarisCommand failed to complete...Device is gone # zpool status -v pool: test state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Mar 20 19:13:40 2013 27.4T scanned out of 69.6T at 183M/s, 67h11m to go 2.43T resilvered, 39.32% done config: NAMESTATE READ WRITE CKSUM test DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 DEGRADED 0 0 0 c8t2d0 DEGRADED 0 0 0 c8t3d0 ONLINE 0 0 0 spare-4 DEGRADED 0 0 0 12459181442598970150 UNAVAIL 0 0 0 c8t45d0 DEGRADED 0 0 0 (resilvering) raidz1-1 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c8t8d0 ONLINE 0 0 0 c8t9d0 ONLINE 0 0 0 raidz1-3 DEGRADED 0 0 0 c8t12d0 ONLINE 0 0 0 c8t13d0 ONLINE 0 0 0 c8t14d0 ONLINE 0 0 0 c8t15d0 DEGRADED 0 0 0 c8t16d0 ONLINE 0 0 0 c8t17d0 ONLINE 0 0 0 c8t18d0 ONLINE 0 0 0 c8t19d0 ONLINE 0 0 0 c8t20d0 DEGRADED 0 0 0 c8t21d0 DEGRADED 0 0 0 spare-10DEGRADED 0 0 0 c8t22d0 DEGRADED 0 0 0 c8t47d0 DEGRADED 0
[zfs-discuss] How to enforce probing of all disks?
Hello all, I have a kind of lame question here: how can I force the system (OI) to probe all the HDD controllers and disks that it can find, and be certain that it has searched everywhere for disks? My remotely supported home-NAS PC was unavailable for a while, and a friend rebooted it for me from a LiveUSB image with SSH (oi_148a). I can see my main pool disks, but not the old boot (rpool) drive. Meaning, that it does not appear in zpool import nor in format outputs. While it is possible that it has finally kicked the bucket, and that won't really be unexpected, I'd like to try and confirm. For example, it might fail to spin up or come into contact with the SATA cable initially - but subsequent probing of the same controller might just find it. Happened before, too - though via a reboot and full POST... The friend won't be available for a few days, and there's no other remote management nor inspection facility for this box, so I'd like to probe from within OI as much as I can. Should be an educational quest, too ;) # cfgadm -al Ap_Id Type Receptacle Occupant Condition Slot36 sata/hp connectedconfigured ok sata0/0::dsk/c5t0d0disk connectedconfigured ok sata0/1::dsk/c5t1d0disk connectedconfigured ok sata0/2::dsk/c5t2d0disk connectedconfigured ok sata0/3::dsk/c5t3d0disk connectedconfigured ok sata0/4::dsk/c5t4d0disk connectedconfigured ok sata0/5::dsk/c5t5d0disk connectedconfigured ok sata1/0sata-portemptyunconfigured ok sata1/1sata-portemptyunconfigured ok ... (USB reports follow) # devfsadm -Cv -- nothing new found Nothing of interest in dmesg... # scanpci -v | grep -i ata Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA AHCI Controller JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller # prtconf -v | grep -i ata name='ata-dma-enabled' type=string items=1 name='atapi-cd-dma-enabled' type=string items=1 value='ADATA USB Flash Drive' value='ADATA' value='ADATA' name='sata' type=int items=1 dev=none value='SATA AHCI 1.0 Interface' dev_link=/dev/cfg/sata1/0 dev_link=/dev/cfg/sata1/1 name='ata-options' type=int items=1 value='atapi' name='sata' type=int items=1 dev=none value='\_SB_.PCI0.SATA' value='SATA AHCI 1.0 Interface' dev_link=/dev/cfg/sata0/0 dev_link=/dev/cfg/sata0/1 dev_link=/dev/cfg/sata0/2 dev_link=/dev/cfg/sata0/3 dev_link=/dev/cfg/sata0/4 dev_link=/dev/cfg/sata0/5 value='id1,sd@SATA_ST2000DL003-9VT15YD217ZL' name='sata-phy' type=int items=1 value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass' value='id1,sd@SATA_ST2000DL003-9VT15YD1XWWB' name='sata-phy' type=int items=1 value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass' value='id1,sd@SATA_ST2000DL003-9VT15YD1VLKC' name='sata-phy' type=int items=1 value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass' value='id1,sd@SATA_ST2000DL003-9VT15YD21QZL' name='sata-phy' type=int items=1 value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass' value='id1,sd@SATA_ST2000DL003-9VT15YD24GCA' name='sata-phy' type=int items=1 value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass' value='id1,sd@SATA_ST2000DL003-9VT15YD24GDG' name='sata-phy' type=int items=1 value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass' This only sees the six ST2000DL003 drives of the main data pool, and the LiveUSB flash drive... So - is it possible to try reinitializing and locating connections to the disk on a commodity motherboard (i.e. no lsiutil, IPMI and such) using only OI, without rebooting the box? The pools are not imported, so if I can detach and reload the sata drivers - I might try that, but I am stumped at how
[zfs-discuss] LAST CALL: zfs-discuss is moving Sunday, March 24, 2013
I hope to see everyone on the other side... *** The ZFS discussion list is moving to java.net. This opensolaris/zfs discussion will not be available after March 24. There is no way to migrate the existing list to the new list. The solaris-zfs project is here: http://java.net/projects/solaris-zfs See the steps below to join the ZFS project or just the discussion list, but you must create an account on java.net to join the list. Thanks, Cindy 1. Create an account on java.net. https://java.net/people/new 2. When logged in to your java.net account, join the solaris-zfs project as an Observer by clicking the Join This Project link on the left side of this page: http://java.net/projects/solaris-zfs 3. Subscribe to the zfs discussion mailing list here: http://java.net/projects/solaris-zfs/lists ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] This mailing list EOL???
mail-archive.com is an independent third party. This is one of their FAQ's http://www.mail-archive.com/faq.html#duration The Mail Archive has been running since 1998. Archiving services are planned to continue indefinitely. We do not plan on ever needing to remove archived material. Do not, however, misconstrue these intentions with a warranty of any kind. We reserve the right to discontinue service at any time. From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Deirdre Straughan Sent: Wednesday, March 20, 2013 5:16 PM To: Cindy Swearingen; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] This mailing list EOL??? Will the archives of all the lists be preserved? I don't think we've seen a clear answer on that (it's possible you haven't, either!). On Wed, Mar 20, 2013 at 2:14 PM, Cindy Swearingen cindy.swearin...@oracle.commailto:cindy.swearin...@oracle.com wrote: Hi Ned, This list is migrating to java.nethttp://java.net and will not be available in its current form after March 24, 2013. The archive of this list is available here: http://www.mail-archive.com/zfs-discuss@opensolaris.org/ I will provide an invitation to the new list shortly. Thanks for your patience. Cindy On 03/20/13 15:05, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: I can't seem to find any factual indication that opensolaris.orghttp://opensolaris.org mailing lists are going away, and I can't even find the reference to whoever said it was EOL in a few weeks ... a few weeks ago. So ... are these mailing lists going bye-bye? ___ zfs-discuss mailing list zfs-discuss@opensolaris.orgmailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.orgmailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- best regards, Deirdré Straughan Community Architect, SmartOS illumos Community Manager cell 720 371 4107 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SSD for L2arc
Hi, Can I know how to configure a SSD to be used for L2arc ? Basically I want to improve read performance. To increase write performance, will SSD for Zil help ? As I read on forums, Zil is only used for mysql/transaction based writes. I have regular writes only. Thanks. Regards, Ram ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD for L2arc
Can I know how to configure a SSD to be used for L2arc ? Basically I want to improve read performance. Read the documentation, specifically the section titled; Creating a ZFS Storage PoolWith Cache Devices To increase write performance, will SSD for Zil help ? As I read on forums, Zil is only used for mysql/transaction based writes. I have regular writes only. That is not correct - the ZIL is used for synchronous writes. From the documentation: The ZFS intent log (ZIL) is provided to satisfy POSIX requirements for synchronous transactions. For example, databases often require their transactions to be on stable storage devices when returning from a system call. NFS and other applications can also use fsync() to ensure data stability. By default, the ZIL is allocated from blocks within the main pool.However, better performance might be possible by using separate intent log devices, such asNVRAMor a dedicated disk. Thanks. Regards, Ram ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD for L2arc
On 2013-03-21 16:24, Ram Chander wrote: Hi, Can I know how to configure a SSD to be used for L2arc ? Basically I want to improve read performance. The man zpool page is quite informative on theory and concepts ;) If your pool already exists, you can prepare the SSD (partition/slice it) and: # zpool add POOLNAME cache cXtYdZsS Likewise, to add a ZIL device you can add a log device, either as a single disk (slice) or as a mirror of two or more: # zpool add POOLNAME log cXtYdZsS # zpool add POOLNAME log mirror cXtYdZsS1 cXtYdZsS2 To increase write performance, will SSD for Zil help ? As I read on forums, Zil is only used for mysql/transaction based writes. I have regular writes only. It may increase performance in two ways: If you have any apps (including NFS, maybe VMs, iSCSI, etc. - not only databases) that regularly issue synchronous writes - those which must be stored on media (not just cached and queued) before the call returns a success, then the ZIL catches these writes instead of the main pool devices. The ZIL is written as ring buffer, so its size is proportional to your pool's throughput - about 3 full-size TXG syncs should fit into the designated ZIL space. That's usually max bandwidth (X Mb/s) times 15 sec (3*5s), or a bit more for peace of mind. 1) If the ZIL device (SLOG) is an SSD, it is presumably quick, so writes should return quickly and sync IOs are less blocked. 2) If the SLOG is on HDD(s) separate from the main pool, then writes into the ZIL cause no mechanical seeks during normal pool IOs, thus requiring time for the disk heads to travel to the reserved ZIL area and back - this is time stolen from both reads and writes in the pool. *Possibly*, fragmentation might also be reduced by having ZIL outside of the main pool, though this statement may be technically invalid as my fault, then. 3) As a *speculation*, it is likely that a HDD doing nothing but SLOG (i.e. a hotspare with a designated slice for ZIL so it does something useful while waiting for failover of a larger pool device) would also give a good boost to performance, since it won't have to seek much. The rotational latency will be there however, limiting reachable IOPS in comparison to an SSD SLOG. HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
Peter, sorry if this is so obvious that you didn't mention it: Have you checked /var/adm/messages and other diagnostic tool output? regards Michael On Wed, Mar 20, 2013 at 4:34 PM, Peter Wood peterwood...@gmail.com wrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Michael Schuster http://recursiveramblings.wordpress.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
Does the Supermicro IPMI show anything when it crashes? Does anything show up in event logs in the BIOS, or in system logs under OI? On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.com wrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
I'm sorry. I should have mentioned it that I can't find any errors in the logs. The last entry in /var/adm/messages is that I removed the keyboard after the last reboot and then it shows the new boot up messages when I boot up the system after the crash. The BIOS log is empty. I'm not sure how to check the IPMI but IPMI is not configured and I'm not using it. Just another observation - the crashes are more intense the more data the system serves (NFS). I'm looking into FRMW upgrades for the LSI now. On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote: Does the Supermicro IPMI show anything when it crashes? Does anything show up in event logs in the BIOS, or in system logs under OI? On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
How about crash dumps? michael On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com wrote: I'm sorry. I should have mentioned it that I can't find any errors in the logs. The last entry in /var/adm/messages is that I removed the keyboard after the last reboot and then it shows the new boot up messages when I boot up the system after the crash. The BIOS log is empty. I'm not sure how to check the IPMI but IPMI is not configured and I'm not using it. Just another observation - the crashes are more intense the more data the system serves (NFS). I'm looking into FRMW upgrades for the LSI now. On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote: Does the Supermicro IPMI show anything when it crashes? Does anything show up in event logs in the BIOS, or in system logs under OI? On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Michael Schuster http://recursiveramblings.wordpress.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
I'm going to need some help with the crash dumps. I'm not very familiar with Solaris. Do I have to enable something to get the crash dumps? Where should I look for them? Thanks for the help. On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster michaelspriv...@gmail.com wrote: How about crash dumps? michael On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.comwrote: I'm sorry. I should have mentioned it that I can't find any errors in the logs. The last entry in /var/adm/messages is that I removed the keyboard after the last reboot and then it shows the new boot up messages when I boot up the system after the crash. The BIOS log is empty. I'm not sure how to check the IPMI but IPMI is not configured and I'm not using it. Just another observation - the crashes are more intense the more data the system serves (NFS). I'm looking into FRMW upgrades for the LSI now. On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote: Does the Supermicro IPMI show anything when it crashes? Does anything show up in event logs in the BIOS, or in system logs under OI? On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Michael Schuster http://recursiveramblings.wordpress.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
On 2013-03-20 17:15, Peter Wood wrote: I'm going to need some help with the crash dumps. I'm not very familiar with Solaris. Do I have to enable something to get the crash dumps? Where should I look for them? Typically the kernel crash dumps are created as a result of kernel panic; also they may be forced by administrative actions like NMI. They require you to configure a dump volume of sufficient size (see dumpadm) and a /var/crash which may be a dataset on a large enough pool - after the reboot the dump data will be migrated there. To help with the hangs you can try the BIOS watchdog (which would require a bmc driver, one which is known from OpenSolaris is alas not opensourced and not redistributable), or with a software deadman timer: http://www.cuddletech.com/blog/pivot/entry.php?id=1044 http://wiki.illumos.org/display/illumos/System+Hangs Also, if you configure crash dump on NMI and set up your IPMI card, then you can likely gain remote access to both the server console (physical and/or serial) and may be able to trigger the NMI, too. HTH, //Jim Thanks for the help. On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster michaelspriv...@gmail.com mailto:michaelspriv...@gmail.com wrote: How about crash dumps? michael On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com mailto:peterwood...@gmail.com wrote: I'm sorry. I should have mentioned it that I can't find any errors in the logs. The last entry in /var/adm/messages is that I removed the keyboard after the last reboot and then it shows the new boot up messages when I boot up the system after the crash. The BIOS log is empty. I'm not sure how to check the IPMI but IPMI is not configured and I'm not using it. Just another observation - the crashes are more intense the more data the system serves (NFS). I'm looking into FRMW upgrades for the LSI now. On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.com mailto:will.murn...@gmail.com wrote: Does the Supermicro IPMI show anything when it crashes? Does anything show up in event logs in the BIOS, or in system logs under OI? On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.com mailto:peterwood...@gmail.com wrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Michael Schuster http://recursiveramblings.wordpress.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
Hi Jim, Thanks for the pointers. I'll definitely look into this. -- Peter Blajev IT Manager, TAAZ Inc. Office: 858-597-0512 x125 On Wed, Mar 20, 2013 at 11:29 AM, Jim Klimov jimkli...@cos.ru wrote: On 2013-03-20 17:15, Peter Wood wrote: I'm going to need some help with the crash dumps. I'm not very familiar with Solaris. Do I have to enable something to get the crash dumps? Where should I look for them? Typically the kernel crash dumps are created as a result of kernel panic; also they may be forced by administrative actions like NMI. They require you to configure a dump volume of sufficient size (see dumpadm) and a /var/crash which may be a dataset on a large enough pool - after the reboot the dump data will be migrated there. To help with the hangs you can try the BIOS watchdog (which would require a bmc driver, one which is known from OpenSolaris is alas not opensourced and not redistributable), or with a software deadman timer: http://www.cuddletech.com/**blog/pivot/entry.php?id=1044http://www.cuddletech.com/blog/pivot/entry.php?id=1044 http://wiki.illumos.org/**display/illumos/System+Hangshttp://wiki.illumos.org/display/illumos/System+Hangs Also, if you configure crash dump on NMI and set up your IPMI card, then you can likely gain remote access to both the server console (physical and/or serial) and may be able to trigger the NMI, too. HTH, //Jim Thanks for the help. On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster michaelspriv...@gmail.com mailto:michaelsprivate@gmail.**commichaelspriv...@gmail.com wrote: How about crash dumps? michael On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com mailto:peterwood...@gmail.com** wrote: I'm sorry. I should have mentioned it that I can't find any errors in the logs. The last entry in /var/adm/messages is that I removed the keyboard after the last reboot and then it shows the new boot up messages when I boot up the system after the crash. The BIOS log is empty. I'm not sure how to check the IPMI but IPMI is not configured and I'm not using it. Just another observation - the crashes are more intense the more data the system serves (NFS). I'm looking into FRMW upgrades for the LSI now. On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.com mailto:will.murn...@gmail.com** wrote: Does the Supermicro IPMI show anything when it crashes? Does anything show up in event logs in the BIOS, or in system logs under OI? On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.com mailto:peterwood...@gmail.com** wrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K __**_ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@**opensolaris.orgzfs-discuss@opensolaris.org http://mail.opensolaris.org/** mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss __**_ zfs-discuss mailing list zfs-discuss@opensolaris.org
Re: [zfs-discuss] [BULK] System started crashing hard after zpool reconfigure and OI upgrade
No problem Trey. Anything will help. Yes, I did a clean install overwriting the old OS. Just to make sure, you actually did an overwrite reinstall with OI151a7 rather than upgrading the existing OS images? If you did a pkg image-update, you should be able to boot back into the oi151a5 image from grub. Apologies in advance if I'm stating the obvious. -- Trey On Mar 20, 2013, at 11:34 AM, Peter Wood peterwood...@gmail.com wrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
On Wed, Mar 20, 2013 at 08:50:40AM -0700, Peter Wood wrote: I'm sorry. I should have mentioned it that I can't find any errors in the logs. The last entry in /var/adm/messages is that I removed the keyboard after the last reboot and then it shows the new boot up messages when I boot up the system after the crash. The BIOS log is empty. I'm not sure how to check the IPMI but IPMI is not configured and I'm not using it. You definitely should! Plugin a cable into the dedicated network port and configure it (easiest way for you is probably to jump into the BIOS and assign the appropriate IP address etc.). Than, for a quick look, point your browser to the given IP port 80 (default login is ADMIN/ADMIN). Also you may now configure some other details (accounts/passwords/roles). To track the problem, either write a script, which polls the parameters in question periodically or just install the latest ipmiViewer and use this to monitor your sensors ad hoc. see ftp://ftp.supermicro.com/utility/IPMIView/ Just another observation - the crashes are more intense the more data the system serves (NFS). I'm looking into FRMW upgrades for the LSI now. Latest LSI FW should be P15, for this MB type 217 (2.17), MB-BIOS C28 (1.0b). However, I doubt, that your problem has anything to do with the SAS-ctrl or OI or ZFS. My guess is, that either your MB is broken (we had an X9DRH-iF, which instantly disappeared as soon as it got some real load) or you have a heat problem (watch you cpu temp e.g. via ipmiviewer). With 2GHz that's not very likely, but worth a try (socket placement on this board is not really smart IMHO). To test quickly - disable all addtional, unneeded service in OI, which may put some load on the machine (like NFS service, http and bla) and perhaps even export unneeded pools (just to be sure) - fire up your ipmiviewer and look at the sensors (set update to 10s) or refresh manually often - start 'openssl speed -multi 32' and keep watching your cpu temp sensors (with 2GHz I guess it takes ~ 12min) I guess, your machine disappears before the CPUs getting really hot (broken MB). If CPUs switch off (usually first CPU2 and a little bit later CPU1) you have a cooling problem. If nothing happens, well, than it could be an OI or ZFS problem ;-) Have fun, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 52768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
Great write up Jens. The chance of two MB to be broken is probably low but overheating is a very good point. It was on my to-do list to setup IPMI and seems that now is the best time to do it. Thanks On Wed, Mar 20, 2013 at 1:08 PM, Jens Elkner jel+...@cs.uni-magdeburg.dewrote: On Wed, Mar 20, 2013 at 08:50:40AM -0700, Peter Wood wrote: I'm sorry. I should have mentioned it that I can't find any errors in the logs. The last entry in /var/adm/messages is that I removed the keyboard after the last reboot and then it shows the new boot up messages when I boot up the system after the crash. The BIOS log is empty. I'm not sure how to check the IPMI but IPMI is not configured and I'm not using it. You definitely should! Plugin a cable into the dedicated network port and configure it (easiest way for you is probably to jump into the BIOS and assign the appropriate IP address etc.). Than, for a quick look, point your browser to the given IP port 80 (default login is ADMIN/ADMIN). Also you may now configure some other details (accounts/passwords/roles). To track the problem, either write a script, which polls the parameters in question periodically or just install the latest ipmiViewer and use this to monitor your sensors ad hoc. see ftp://ftp.supermicro.com/utility/IPMIView/ Just another observation - the crashes are more intense the more data the system serves (NFS). I'm looking into FRMW upgrades for the LSI now. Latest LSI FW should be P15, for this MB type 217 (2.17), MB-BIOS C28 (1.0b). However, I doubt, that your problem has anything to do with the SAS-ctrl or OI or ZFS. My guess is, that either your MB is broken (we had an X9DRH-iF, which instantly disappeared as soon as it got some real load) or you have a heat problem (watch you cpu temp e.g. via ipmiviewer). With 2GHz that's not very likely, but worth a try (socket placement on this board is not really smart IMHO). To test quickly - disable all addtional, unneeded service in OI, which may put some load on the machine (like NFS service, http and bla) and perhaps even export unneeded pools (just to be sure) - fire up your ipmiviewer and look at the sensors (set update to 10s) or refresh manually often - start 'openssl speed -multi 32' and keep watching your cpu temp sensors (with 2GHz I guess it takes ~ 12min) I guess, your machine disappears before the CPUs getting really hot (broken MB). If CPUs switch off (usually first CPU2 and a little bit later CPU1) you have a cooling problem. If nothing happens, well, than it could be an OI or ZFS problem ;-) Have fun, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 52768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] This mailing list EOL???
I can't seem to find any factual indication that opensolaris.org mailing lists are going away, and I can't even find the reference to whoever said it was EOL in a few weeks ... a few weeks ago. So ... are these mailing lists going bye-bye? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] This mailing list EOL???
Hi Ned, This list is migrating to java.net and will not be available in its current form after March 24, 2013. The archive of this list is available here: http://www.mail-archive.com/zfs-discuss@opensolaris.org/ I will provide an invitation to the new list shortly. Thanks for your patience. Cindy On 03/20/13 15:05, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: I can't seem to find any factual indication that opensolaris.org mailing lists are going away, and I can't even find the reference to whoever said it was EOL in a few weeks ... a few weeks ago. So ... are these mailing lists going bye-bye? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] This mailing list EOL???
Will the archives of all the lists be preserved? I don't think we've seen a clear answer on that (it's possible you haven't, either!). On Wed, Mar 20, 2013 at 2:14 PM, Cindy Swearingen cindy.swearin...@oracle.com wrote: Hi Ned, This list is migrating to java.net and will not be available in its current form after March 24, 2013. The archive of this list is available here: http://www.mail-archive.com/**zfs-discuss@opensolaris.org/http://www.mail-archive.com/zfs-discuss@opensolaris.org/ I will provide an invitation to the new list shortly. Thanks for your patience. Cindy On 03/20/13 15:05, Edward Ned Harvey (**opensolarisisdeadlongliveopens**olaris) wrote: I can't seem to find any factual indication that opensolaris.org mailing lists are going away, and I can't even find the reference to whoever said it was EOL in a few weeks ... a few weeks ago. So ... are these mailing lists going bye-bye? __**_ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss __**_ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- best regards, Deirdré Straughan Community Architect, SmartOS illumos Community Manager cell 720 371 4107 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade
I can reproduce the problem. I can crash the system. Here are the steps I did (some steps may not be needed but I haven't tested it): - Clean install of OI 151.a.7 on Supermicro hardware described above (32GB RAM though, not the 128GB) - Create 1 zpool, 6 raidz vdevs with 5 drives each - NFS export a dataset zfs set sharenfs=rw=@10.20.1/24 vol01/htmlspace - Create zfs child dataset zfs create vol01/htmlspace/A $ zfs get -H sharenfs vol01/htmlspace/A vol01/htmlspace/A sharenfsrw=@10.20.1/24 inherited from vol01/htmlspace - Stop NFS shearing for the child dataset zfs set sharenfs=off vol01/htmlspace/A The crash is instant after the sharenfs=off command. I thought it was coincident so after reboot I tried it on another dataset. Instant crash again. I get my prompt back but that's it. The system is gone after that. The NFS exported file systems are not accessed by any system on the network. They are not in use. That's why I wanted to stop exporting them. And, even if they were in use this should now crash the system, right? I can't try the other box because it is heavy in production. At least not until later tonight. I thought I'll collect some advice to make each crash as useful as possible. Any pointers are appreciated. Thanks, -- Peter On Wed, Mar 20, 2013 at 8:34 AM, Peter Wood peterwood...@gmail.com wrote: I have two identical Supermicro boxes with 32GB ram. Hardware details at the end of the message. They were running OI 151.a.5 for months. The zpool configuration was one storage zpool with 3 vdevs of 8 disks in RAIDZ2. The OI installation is absolutely clean. Just next-next-next until done. All I do is configure the network after install. I don't install or enable any other services. Then I added more disks and rebuild the systems with OI 151.a.7 and this time configured the zpool with 6 vdevs of 5 disks in RAIDZ. The systems started crashing really bad. They just disappear from the network, black and unresponsive console, no error lights but no activity indication either. The only way out is to power cycle the system. There is no pattern in the crashes. It may crash in 2 days in may crash in 2 hours. I upgraded the memory on both systems to 128GB at no avail. This is the max memory they can take. In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool. Any idea what could be the problem. Thank you -- Peter Supermicro X9DRH-iF Xeon E5-2620 @ 2.0 GHz 6-Core LSI SAS9211-8i HBA 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Please join us on the new zfs discuss list on java.net
Hi Everyone, The ZFS discussion list is moving to java.net. This opensolaris/zfs discussion will not be available after March 24. There is no way to migrate the existing list to the new list. The solaris-zfs project is here: http://java.net/projects/solaris-zfs See the steps below to join the ZFS project or just the discussion list, but you must create an account on java.net to join the list. Thanks, Cindy 1. Create an account on java.net. https://java.net/people/new 2. When logged in to your java.net account, join the solaris-zfs project as an Observer by clicking the Join This Project link on the left side of this page: http://java.net/projects/solaris-zfs 3. Subscribe to the zfs discussion mailing list here: http://java.net/projects/solaris-zfs/lists ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
Andrew Werchowiecki wrote: Thanks for the info about slices, I may give that a go later on. I’m not keen on that because I have clear evidence (as in zpools set up this way, right now, working, without issue) that GPT partitions of the style shown above work and I want to see why it doesn’t work in my set up rather than simply ignoring and moving on. Didn't you read Richard's post? You can have only one Solaris partition at a time. Your original example failed when you tried to add a second. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
Hi Andrew, Your original syntax was incorrect. A p* device is a larger container for the d* device or s* devices. In the case of a cache device, you need to specify a d* or s* device. That you can add p* devices to a pool is a bug. Adding different slices from c25t10d1 as both log and cache devices would need the s* identifier, but you've already added the entire c25t10d1 as the log device. A better configuration would be using c25t10d1 for log and using c25t9d1 for cache or provide some spares for this large pool. After you remove the log devices, re-add like this: # zpool add aggr0 log c25t10d1 # zpool add aggr0 cache c25t9d1 You might review the ZFS recommendation practices section, here: http://docs.oracle.com/cd/E26502_01/html/E29007/zfspools-4.html#storage-2 See example 3-4 for adding a cache device, here: http://docs.oracle.com/cd/E26502_01/html/E29007/gayrd.html#gazgw Always have good backups. Thanks, Cindy On 03/18/13 23:23, Andrew Werchowiecki wrote: I did something like the following: format -e /dev/rdsk/c5t0d0p0 fdisk 1 (create) F (EFI) 6 (exit) partition label 1 y 0 usr wm 64 4194367e 1 usr wm 4194368 117214990 label 1 y Total disk size is 9345 cylinders Cylinder size is 12544 (512 byte) blocks Cylinders Partition Status Type Start End Length % = == = === == === 1 EFI 0 9345 9346 100 partition print Current partition table (original): Total disk sectors available: 117214957 + 16384 (reserved sectors) Part Tag Flag First Sector Size Last Sector 0 usr wm 64 2.00GB 4194367 1 usr wm 4194368 53.89GB 117214990 2 unassigned wm 0 0 0 3 unassigned wm 0 0 0 4 unassigned wm 0 0 0 5 unassigned wm 0 0 0 6 unassigned wm 0 0 0 8 reserved wm 117214991 8.00MB 117231374 This isn’t the output from when I did it but it is exactly the same steps that I followed. Thanks for the info about slices, I may give that a go later on. I’m not keen on that because I have clear evidence (as in zpools set up this way, right now, working, without issue) that GPT partitions of the style shown above work and I want to see why it doesn’t work in my set up rather than simply ignoring and moving on. *From:*Fajar A. Nugraha [mailto:w...@fajar.net] *Sent:* Sunday, 17 March 2013 3:04 PM *To:* Andrew Werchowiecki *Cc:* zfs-discuss@opensolaris.org *Subject:* Re: [zfs-discuss] partioned cache devices On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki andrew.werchowie...@xpanse.com.au mailto:andrew.werchowie...@xpanse.com.au wrote: I understand that p0 refers to the whole disk... in the logs I pasted in I'm not attempting to mount p0. I'm trying to work out why I'm getting an error attempting to mount p2, after p1 has successfully mounted. Further, this has been done before on other systems in the same hardware configuration in the exact same fashion, and I've gone over the steps trying to make sure I haven't missed something but can't see a fault. How did you create the partition? Are those marked as solaris partition, or something else (e.g. fdisk on linux use type 83 by default). I'm not keen on using Solaris slices because I don't have an understanding of what that does to the pool's OS interoperability. Linux can read solaris slice and import solaris-made pools just fine, as long as you're using compatible zpool version (e.g. zpool version 28). -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS
Hi Hans, Start with the ZFS Admin Guide, here: http://docs.oracle.com/cd/E26502_01/html/E29007/index.html Or, start with your specific questions. Thanks, Cindy On 03/19/13 03:30, Hans J. Albertsson wrote: as used on Illumos? I've seen a few tutorials written by people who obviously are very action oriented; afterwards you find you have worn your keyboard down a bit and not learned a lot at all, at least not in the sense of understanding what zfs is and what it does and why things are the way they are. I'm looking for something that would make me afterwards understand what, say, commands like zpool import ... or zfs send ... actually do, and some idea as to why, so I can begin to understand ZFS in a way that allows me to make educated guesses on how to perform tasks I haven't tried before. And mostly without having to ask around for days on end. For SOME part of zfs I'm already there, but only for the things I had to do more than twice or so while managing the Swedish lab at Sun Micro. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS
There are links to videos and other materials here: http://wiki.smartos.org/display/DOC/ZFS Not as organized as I'd like... On Tue, Mar 19, 2013 at 2:30 AM, Hans J. Albertsson hans.j.alberts...@branneriet.se wrote: as used on Illumos? I've seen a few tutorials written by people who obviously are very action oriented; afterwards you find you have worn your keyboard down a bit and not learned a lot at all, at least not in the sense of understanding what zfs is and what it does and why things are the way they are. I'm looking for something that would make me afterwards understand what, say, commands like zpool import ... or zfs send ... actually do, and some idea as to why, so I can begin to understand ZFS in a way that allows me to make educated guesses on how to perform tasks I haven't tried before. And mostly without having to ask around for days on end. For SOME part of zfs I'm already there, but only for the things I had to do more than twice or so while managing the Swedish lab at Sun Micro. __**_ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- best regards, Deirdré Straughan Community Architect, SmartOS illumos Community Manager cell 720 371 4107 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
Andrew Werchowiecki wrote: Total disk size is 9345 cylinders Cylinder size is 12544 (512 byte) blocks Cylinders Partition StatusType Start End Length% = == = === == === 1 EFI 0 93459346100 You only have a p1 (and for a GPT/EFI labeled disk, you can only have p1 - no other FDISK partitions are allowed). partition print Current partition table (original): Total disk sectors available: 117214957 + 16384 (reserved sectors) Part TagFlag First Sector Size Last Sector 0usrwm642.00GB 4194367 1usrwm 4194368 53.89GB 117214990 2 unassignedwm 0 0 0 3 unassignedwm 0 0 0 4 unassignedwm 0 0 0 5 unassignedwm 0 0 0 6 unassignedwm 0 0 0 8 reservedwm 1172149918.00MB 117231374 You have an s0 and s1. This isn’t the output from when I did it but it is exactly the same steps that I followed. Thanks for the info about slices, I may give that a go later on. I’m not keen on that because I have clear evidence (as in zpools set up this way, right now, working, without issue) that GPT partitions of the style shown above work and I want to see why it doesn’t work in my set up rather than simply ignoring and moving on. You would have to blow away the partitioning you have, and create an FDISK partitioned disk (not EFI), and then create a p1 and p2 partition. (Don't use the 'partition' subcommand, which confusingly creates solaris slices.) Give the FDISK partitions a partition type which nothing will recognise, such as 'other', so that nothing will try and interpret them as OS partitions. Then you can use them as raw devices, and they should be portable between OS's which can handle FDISK partitioned devices. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
On 2013-03-19 20:38, Cindy Swearingen wrote: Hi Andrew, Your original syntax was incorrect. A p* device is a larger container for the d* device or s* devices. In the case of a cache device, you need to specify a d* or s* device. That you can add p* devices to a pool is a bug. I disagree; at least, I've always thought differently: the d device is the whole disk denomination, with a unique number for a particular controller link (c+t). The disk has some partitioning table, MBR or GPT/EFI. In these tables, partition p0 stands for the table itself (i.e. to manage partitioning), and the rest kind of depends. In case of MBR tables, one partition may be named as having a Solaris (or Solaris2) type, and there it holds a SMI table of Solaris slices, and these slices can hold legacy filesystems or components of ZFS pools. In case of GPT, the GPT-partitions can be used directly by ZFS. However, they are also denominated as slices in ZFS and format utility. I believe, Solaris-based OSes accessing a p-named partition and an s-named slice of the same number on a GPT disk should lead to the same range of bytes on disk, but I am not really certain about this. Also, if a whole disk is given to ZFS (and for OSes other that the latest Solaris 11 this means non-rpool disks), then ZFS labels the disk as GPT and defines a partition for itself plus a small trailing partition (likely to level out discrepancies with replacement disks that might happen to be a few sectors too small). In this case ZFS reports that it uses cXtYdZ as a pool component, since it considers itself in charge of the partitioning table and its inner contents, and doesn't intend to share the disk with other usages (dual-booting and other OSes' partitions, or SLOG and L2ARC parts, etc). This also allows ZFS to influence hardware-related choices, like caching and throttling, and likely auto-expansion with the changed LUN sizes by fixing up the partition table along the way, since it assumes being 100% in charge of the disk. I don't think there is a crime in trying to use the partitions (of either kind) as ZFS leaf vdevs, even the zpool(1M) manpage states that: ... The following virtual devices are supported: disk A block device, typically located under /dev/dsk. ZFS can use individual slices or partitions, though the recommended mode of operation is to use whole disks. ... This is orthogonal to the fact that there can only be one Solaris slice table, inside one partition, on MBR. AFAIK this is irrelevant on GPT/EFI - no SMI slices there. On my old home NAS with OpenSolaris I certainly did have MBR partitions on the rpool intended initially for some dual-booted OSes, but repurposed as L2ARC and ZIL devices for the storage pool on other disks, when I played with that technology. Didn't gain much with a single spindle ;) HTH, //Jim Klimov ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
On 03/19/13 20:27, Jim Klimov wrote: I disagree; at least, I've always thought differently: the d device is the whole disk denomination, with a unique number for a particular controller link (c+t). The disk has some partitioning table, MBR or GPT/EFI. In these tables, partition p0 stands for the table itself (i.e. to manage partitioning), p0 is the whole disk regardless of any partitioning. (Hence you can use p0 to access any type of partition table.) and the rest kind of depends. In case of MBR tables, one partition may be named as having a Solaris (or Solaris2) type, and there it holds a SMI table of Solaris slices, and these slices can hold legacy filesystems or components of ZFS pools. In case of GPT, the GPT-partitions can be used directly by ZFS. However, they are also denominated as slices in ZFS and format utility. The GPT partitioning spec requires the disk to be FDISK partitioned with just one single FDISK partition of type EFI, so that tools which predate GPT partitioning will still see such a GPT disk as fully assigned to FDISK partitions, and therefore less likely to be accidentally blown away. I believe, Solaris-based OSes accessing a p-named partition and an s-named slice of the same number on a GPT disk should lead to the same range of bytes on disk, but I am not really certain about this. No, you'll see just p0 (whole disk), and p1 (whole disk less space for the backwards compatible FDISK partitioning). Also, if a whole disk is given to ZFS (and for OSes other that the latest Solaris 11 this means non-rpool disks), then ZFS labels the disk as GPT and defines a partition for itself plus a small trailing partition (likely to level out discrepancies with replacement disks that might happen to be a few sectors too small). In this case ZFS reports that it uses cXtYdZ as a pool component, For an EFI disk, the device name without a final p* or s* component is the whole EFI partition. (It's actually the s7 slice minor device node, but the s7 is dropped from the device name to avoid the confusion we had with s2 on SMI labeled disks being the whole SMI partition.) since it considers itself in charge of the partitioning table and its inner contents, and doesn't intend to share the disk with other usages (dual-booting and other OSes' partitions, or SLOG and L2ARC parts, etc). This also allows ZFS to influence hardware-related choices, like caching and throttling, and likely auto-expansion with the changed LUN sizes by fixing up the partition table along the way, since it assumes being 100% in charge of the disk. I don't think there is a crime in trying to use the partitions (of either kind) as ZFS leaf vdevs, even the zpool(1M) manpage states that: ... The following virtual devices are supported: disk A block device, typically located under /dev/dsk. ZFS can use individual slices or partitions, though the recommended mode of operation is to use whole disks. ... Right. This is orthogonal to the fact that there can only be one Solaris slice table, inside one partition, on MBR. AFAIK this is irrelevant on GPT/EFI - no SMI slices there. There's a simpler way to think of it on x86. You always have FDISK partitioning (p1, p2, p3, p4). You can then have SMI or GPT/EFI slices (both called s0, s1, ...) in an FDISK partition of the appropriate type. With SMI labeling, s2 is by convention the whole Solaris FDISK partition (although this is not enforced). With EFI labeling, s7 is enforced as the whole EFI FDISK partition, and so the trailing s7 is dropped off the device name for clarity. This simplicity is brought about because the GPT spec requires that backwards compatible FDISK partitioning is included, but with just 1 partition assigned. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
On 2013-03-19 22:07, Andrew Gabriel wrote: The GPT partitioning spec requires the disk to be FDISK partitioned with just one single FDISK partition of type EFI, so that tools which predate GPT partitioning will still see such a GPT disk as fully assigned to FDISK partitions, and therefore less likely to be accidentally blown away. Okay, I guess I got entangled in terminology now ;) Anyhow, your words are not all news to me, though my write-up was likely misleading to unprepared readers... sigh... Thanks for the clarifications and deeper details that I did not know! So, we can concur that GPT does indeed include the fake MBR header with one EFI partition which addresses the smaller of 2TB (MBR limit) or disk size, minus a few sectors for the GPT housekeeping. Inside the EFI partition are defined the GPT, um, partitions (represented as slices in Solaris). This is after all a GUID *Partition* Table, and that's how parted refers to them too ;) Notably, there are also unportable tricks to fool legacy OSes and bootloaders into addressing the same byte ranges via both MBR entries (forged manually and abusing the GPT/EFI spec) and proper GPT entries, as partitions in the sense of each table. //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss Digest, Vol 89, Issue 12
You could always use 40-gigabit between the two storage systems which would speed things dramatically, or back to back 56-gigabit IB. From: zfs-discuss-requ...@opensolaris.org Sent: Monday, March 18, 2013 11:01 PM To: zfs-discuss@opensolaris.org Subject: zfs-discuss Digest, Vol 89, Issue 12 Send zfs-discuss mailing list submissions to zfs-discuss@opensolaris.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.opensolaris.org/mailman/listinfo/zfs-discuss or, via email, send a message with subject or body 'help' to zfs-discuss-requ...@opensolaris.org You can reach the person managing the list at zfs-discuss-ow...@opensolaris.org When replying, please edit your Subject line so it is more specific than Re: Contents of zfs-discuss digest... Today's Topics: 1. Re: [zfs] Re: Petabyte pool? (Richard Yao) 2. Re: [zfs] Re: Petabyte pool? (Trey Palmer) -- Message: 1 Date: Sat, 16 Mar 2013 08:23:07 -0400 From: Richard Yao r...@gentoo.org To: z...@lists.illumos.org Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] [zfs] Re: Petabyte pool? Message-ID: 5144642b.1030...@gentoo.org Content-Type: text/plain; charset=iso-8859-1 On 03/16/2013 12:57 AM, Richard Elling wrote: On Mar 15, 2013, at 6:09 PM, Marion Hakanson hakan...@ohsu.edu wrote: So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Don't forget about backups :-) -- richard Transferring 1 PB over a 10 gigabit link will take at least 10 days when overhead is taken into account. The backup system should have a dedicated 10 gigabit link at the minimum and using incremental send/recv will be extremely important. -- next part -- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 901 bytes Desc: OpenPGP digital signature URL: http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130316/de90 7dfe/attachment-0001.bin -- Message: 2 Date: Sat, 16 Mar 2013 01:30:41 -0400 (EDT) From: Trey Palmer t...@nerdmagic.com To: z...@lists.illumos.org z...@lists.illumos.org Cc: z...@lists.illumos.org z...@lists.illumos.org, zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] [zfs] Re: Petabyte pool? Message-ID: 1ce7bf11-6e42-421e-b136-14c0d557d...@nerdmagic.com Content-Type: text/plain; charset=us-ascii I know it's heresy these days, but given the I/O throughput you're looking for and the amount you're going to spend on disks, a T5-2 could make sense when they're released (I think) later this month. Crucial sells RAM they guarantee for use in SPARC T-series, and since you're at an edu the academic discount is 35%. So A T4-2 with 512GB RAM could be had for under $35K shortly after release, 4-5 months before the E5 Xeon was released. It seemed a surprisingly good deal to me. The T5-2 has 32x3.6GHz cores, 256 threads and ~150GB/s aggregate memory bandwidth. In my testing a T4-1 can compete with a 12-core E-5 box on I/O and memory bandwidth, and this thing is about 5 times bigger than the T4-1. It should have at least 10 PCIe's and will take 32 DIMMs minimum, maybe 64. And is likely to cost you less than $50K with aftermarket RAM. -- Trey On Mar 15, 2013, at 10:35 PM, Marion Hakanson hakan...@ohsu.edu wrote: Ray said: Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Marion said: How many HBA's in the R720? Ray said: We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]). Sounds similar in approach to the Aberdeen product another sender referred to, with SAS switch layout: http://www.aberdeeninc.com/images/1-up-petarack2.jpg One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout on a 40-slot server with 40x SATA drives in it. But the server uses n expanders, instead using SAS-to-SATA octopus cables to connect the drives directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i). What I found was that the internal pool was significantly faster for both sequential and random I/O than the pool on the external JBOD. My conclusion was that I would not want to exceed ~48 drives on a single 8-port SAS HBA. So I thought that running the I/O of all your hundreds of drives through only two HBA's would be a bottleneck. LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec for that card in an x8 PCIe-2.0 slot. Sure, the newer 9207-8e is rated at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same 8 SAS ports going at 4800MBytes/sec. Yes, I know the disks probably can't go that fast. But in my tests above, the internal 40-disk pool measures 2000MBytes/sec sequential reads and writes,
Re: [zfs-discuss] partioned cache devices
I did something like the following: format -e /dev/rdsk/c5t0d0p0 fdisk 1 (create) F (EFI) 6 (exit) partition label 1 y 0 usr wm 64 4194367e 1 usr wm 4194368 117214990 label 1 y Total disk size is 9345 cylinders Cylinder size is 12544 (512 byte) blocks Cylinders Partition StatusType Start End Length% = == = === == === 1 EFI 0 93459346100 partition print Current partition table (original): Total disk sectors available: 117214957 + 16384 (reserved sectors) Part TagFlag First Sector Size Last Sector 0usrwm642.00GB 4194367 1usrwm 4194368 53.89GB 117214990 2 unassignedwm 0 0 0 3 unassignedwm 0 0 0 4 unassignedwm 0 0 0 5 unassignedwm 0 0 0 6 unassignedwm 0 0 0 8 reservedwm 1172149918.00MB 117231374 This isn't the output from when I did it but it is exactly the same steps that I followed. Thanks for the info about slices, I may give that a go later on. I'm not keen on that because I have clear evidence (as in zpools set up this way, right now, working, without issue) that GPT partitions of the style shown above work and I want to see why it doesn't work in my set up rather than simply ignoring and moving on. From: Fajar A. Nugraha [mailto:w...@fajar.net] Sent: Sunday, 17 March 2013 3:04 PM To: Andrew Werchowiecki Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] partioned cache devices On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki andrew.werchowie...@xpanse.com.aumailto:andrew.werchowie...@xpanse.com.au wrote: I understand that p0 refers to the whole disk... in the logs I pasted in I'm not attempting to mount p0. I'm trying to work out why I'm getting an error attempting to mount p2, after p1 has successfully mounted. Further, this has been done before on other systems in the same hardware configuration in the exact same fashion, and I've gone over the steps trying to make sure I haven't missed something but can't see a fault. How did you create the partition? Are those marked as solaris partition, or something else (e.g. fdisk on linux use type 83 by default). I'm not keen on using Solaris slices because I don't have an understanding of what that does to the pool's OS interoperability. Linux can read solaris slice and import solaris-made pools just fine, as long as you're using compatible zpool version (e.g. zpool version 28). -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki andrew.werchowie...@xpanse.com.au wrote: I understand that p0 refers to the whole disk... in the logs I pasted in I'm not attempting to mount p0. I'm trying to work out why I'm getting an error attempting to mount p2, after p1 has successfully mounted. Further, this has been done before on other systems in the same hardware configuration in the exact same fashion, and I've gone over the steps trying to make sure I haven't missed something but can't see a fault. How did you create the partition? Are those marked as solaris partition, or something else (e.g. fdisk on linux use type 83 by default). I'm not keen on using Solaris slices because I don't have an understanding of what that does to the pool's OS interoperability. Linux can read solaris slice and import solaris-made pools just fine, as long as you're using compatible zpool version (e.g. zpool version 28). -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Re: Petabyte pool?
On 03/16/2013 12:57 AM, Richard Elling wrote: On Mar 15, 2013, at 6:09 PM, Marion Hakanson hakan...@ohsu.edu wrote: So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Don't forget about backups :-) -- richard Transferring 1 PB over a 10 gigabit link will take at least 10 days when overhead is taken into account. The backup system should have a dedicated 10 gigabit link at the minimum and using incremental send/recv will be extremely important. signature.asc Description: OpenPGP digital signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Re: Petabyte pool?
I know it's heresy these days, but given the I/O throughput you're looking for and the amount you're going to spend on disks, a T5-2 could make sense when they're released (I think) later this month. Crucial sells RAM they guarantee for use in SPARC T-series, and since you're at an edu the academic discount is 35%. So A T4-2 with 512GB RAM could be had for under $35K shortly after release, 4-5 months before the E5 Xeon was released. It seemed a surprisingly good deal to me. The T5-2 has 32x3.6GHz cores, 256 threads and ~150GB/s aggregate memory bandwidth. In my testing a T4-1 can compete with a 12-core E-5 box on I/O and memory bandwidth, and this thing is about 5 times bigger than the T4-1. It should have at least 10 PCIe's and will take 32 DIMMs minimum, maybe 64. And is likely to cost you less than $50K with aftermarket RAM. -- Trey On Mar 15, 2013, at 10:35 PM, Marion Hakanson hakan...@ohsu.edu wrote: Ray said: Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Marion said: How many HBA's in the R720? Ray said: We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]). Sounds similar in approach to the Aberdeen product another sender referred to, with SAS switch layout: http://www.aberdeeninc.com/images/1-up-petarack2.jpg One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout on a 40-slot server with 40x SATA drives in it. But the server uses n expanders, instead using SAS-to-SATA octopus cables to connect the drives directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i). What I found was that the internal pool was significantly faster for both sequential and random I/O than the pool on the external JBOD. My conclusion was that I would not want to exceed ~48 drives on a single 8-port SAS HBA. So I thought that running the I/O of all your hundreds of drives through only two HBA's would be a bottleneck. LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec for that card in an x8 PCIe-2.0 slot. Sure, the newer 9207-8e is rated at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same 8 SAS ports going at 4800MBytes/sec. Yes, I know the disks probably can't go that fast. But in my tests above, the internal 40-disk pool measures 2000MBytes/sec sequential reads and writes, while the external 40-disk JBOD measures at 1500 to 1700 MBytes/sec. Not a lot slower, but significantly slower, so I do think the number of HBA's makes a difference. At the moment, I'm leaning toward piling six, eight, or ten HBA's into a server, preferably one with dual IOH's (thus two PCIe busses), and connecting dual-path JBOD's in that manner. I hadn't looked into SAS switches much, but they do look more reliable than daisy-chaining a bunch of JBOD's together. I just haven't seen how to get more bandwidth through them to a single host. Regards, Marion --- illumos-zfs Archives: https://www.listbox.com/member/archive/182191/=now RSS Feed: https://www.listbox.com/member/archive/rss/182191/22500336-78e51065 Modify Your Subscription: https://www.listbox.com/member/?member_id=22500336id_secret=22500336-0da17977 Powered by Listbox: http://www.listbox.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
hakan...@ohsu.edu said: I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. richard.ell...@gmail.com said: NFS v4 or DFS (or even clever sysadmin + automount) offers single namespace without needing the complexity of NFSv4.1, lustre, glusterfs, etc. Been using NFSv4 since it showed up in Solaris-10 FCS, and it is true that I've been clever enough (without automount -- I like my computers to be as deterministic as possible, thank you very much :-) for our NFS clients to see a single directory-tree namespace which abstracts away the actual server/location of a particular piece of data. However, we find it starts getting hard to manage when a single project (think directory node) needs more space than their current NFS server will hold. Or perhaps what you're getting at above is even more clever than I have been to date, and is eluding me at the moment. I did see someone mention NFSv4 referrals recently, maybe that would help. Plus, believe it or not, some of our customers still insist on having the server name in their path hierarchy for some reason, like /home/mynfs1/, /home/mynfs2/, and so on. Perhaps I've just not been persuasive enough yet (:-). richard.ell...@gmail.com said: Don't forget about backups :-) I was hoping I could get by with telling them to buy two of everything. Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Petabyte pool?
On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote: Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. What does one do for power? What are the power requirements when the system is first powered on? Can drive spin-up be staggered between JBOD chassis? Does the server need to be powered up last so that it does not time out on the zfs import? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Petabyte pool?
On 2013-03-16 15:20, Bob Friesenhahn wrote: On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote: Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. What does one do for power? What are the power requirements when the system is first powered on? Can drive spin-up be staggered between JBOD chassis? Does the server need to be powered up last so that it does not time out on the zfs import? I guess you can use managed PDUs like those from APC (many models for varied socket types and amounts); they can be scripted on an advanced level, and on a basic level I think delays can be just configured per-socket to make the staggered startup after giving power from the wall (UPS) regardless of what the boxes' individual power sources can do. Conveniently, they also allow to do a remote hard-reset of hung boxes without walking to the server room ;) My 2c, //Jim Klimov ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Petabyte pool?
On 2013-03-16 15:20, Bob Friesenhahn wrote: On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote: Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. What does one do for power? What are the power requirements when the system is first powered on? Can drive spin-up be staggered between JBOD chassis? Does the server need to be powered up last so that it does not time out on the zfs import? Giving this question a second thought, I think JBODs should spin-up quickly (i.e. when power is given) while the server head(s) take time to pass POST, initialize their HBAs and other stuff. Booting 8 JBODs, one every 15 seconds to complete a typical spin-up power draw, would take a couple of minutes. It is likely that a server booted along with the first JBOD won't get to importing the pool this quickly ;) Anyhow, with such a system attention should be given to redundant power and cooling, including redundant UPSes preferably fed from different power lines going into the room. This does not seem like a fantastic power sucker, however. 480 drives at 15W would consume 7200W; add a bit for processor/RAM heads (perhaps a kW?) and this would still fit into 8-10kW, so a couple of 15kVA UPSes (or more smaller ones) should suffice including redundancy. This might overall exceed a rack in size though. But for power/cooling this seems like a standard figure for a 42U rack or just a bit more. //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Petabyte pool?
On Sat, Mar 16, 2013 at 2:27 PM, Jim Klimov jimkli...@cos.ru wrote: On 2013-03-16 15:20, Bob Friesenhahn wrote: On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote: Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. What does one do for power? What are the power requirements when the system is first powered on? Can drive spin-up be staggered between JBOD chassis? Does the server need to be powered up last so that it does not time out on the zfs import? I guess you can use managed PDUs like those from APC (many models for varied socket types and amounts); they can be scripted on an advanced level, and on a basic level I think delays can be just configured per-socket to make the staggered startup after giving power from the wall (UPS) regardless of what the boxes' individual power sources can do. Conveniently, they also allow to do a remote hard-reset of hung boxes without walking to the server room ;) My 2c, //Jim Klimov Any modern JBOD should have the intelligence built in to stagger drive spin-up. I wouldn't spend money on one that didn't. There's really no need to stagger the JBOD power-up at the PDU. As for the head, yes it should have a delayed power on which you can typically set in the BIOS. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
I just recently built an OpenIndiana 151a7 system that is currently 1/2 PB that will be expanded to 1 PB as we collect imaging data for the Human Connectome Project at Washington University in St. Louis. It is very much like your use case as this is an offsite backup system that will write once and read rarely. It has displaced a BlueArc DR system because their mechanisms for syncing over distances could not keep up with our data generation rate. The fact it cost 5x per TB as homebrew helped the decision also. It is currently 180 4TB SAS Seagate Constellations in 4 Supermicro JBODs. The JBODS currently are in two branches only cascading once. When expanded 4 JBODs will be on each branch. The pool is configured as 9 zvols of 19 drives in raidz3. The remaining disks are configured as hot spares. Metedata only is cached in 128GB ram and 2 480GB Intel 520 SSDs for L2ARC. Sync (ZIL) is turned off since the worst that would happen is that we would need to rerun an rsync job. Two identical servers were built for a cold standby configuration. Since it is a DR system the need for a hot standby was ruled out since even several hours downtime would not be an issue. Each server is fitted with 2 LSI 9207-8e HBAs configured as redundant multipath to the JBODs. Before putting in into service I ran several iozone tests to benchmark the pool. Even with really fat vdevs the performance is impressive. If you're interested in that data let me know.It has many hours of idle time each day so additional performance tests are not out of the question either. Actually I should say I designed and configured the system. The system was assembled by a colleague at UMINN. If you would like more details on the hardware I have a very detailed assembly doc I wrote and would be happy to share. The system receives daily rsyncs from our production BlueArc system. The rsyncs are split into 120 parallel rsync jobs. This overcomes the latency slow down TCP suffers from and we see total throughput between 500-700Mb/s. The BlueArc has 120TB of 15k SAS tiered to NL-SAS. All metadata is on the SAS pool. The ZFS system outpaces the BlueArc on metadata when rsync does its tree walk. Given all the safeguards built into ZFS, I would not hesitate to build a production system at the multi-petabyte scale. If a channel to disks are no longer available it will simply stop writing and data will be safe. Given the redundant paths, power supplies, etc, the odds of that happening are very unlikely. The single points of failure left when running a single server remain at the motherboard, CPU and RAM level. Build a hot standby server and human error becomes the most likely failure. -Chip On Fri, Mar 15, 2013 at 8:09 PM, Marion Hakanson hakan...@ohsu.edu wrote: Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
It's a home set up, the performance penalty from splitting the cache devices is non-existant, and that work around sounds like some pretty crazy amount of overhead where I could instead just have a mirrored slog. I'm less concerned about wasted space, more concerned about amount of SAS ports I have available. I understand that p0 refers to the whole disk... in the logs I pasted in I'm not attempting to mount p0. I'm trying to work out why I'm getting an error attempting to mount p2, after p1 has successfully mounted. Further, this has been done before on other systems in the same hardware configuration in the exact same fashion, and I've gone over the steps trying to make sure I haven't missed something but can't see a fault. I'm not keen on using Solaris slices because I don't have an understanding of what that does to the pool's OS interoperability. From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) [opensolarisisdeadlongliveopensola...@nedharvey.com] Sent: Friday, 15 March 2013 8:44 PM To: Andrew Werchowiecki; zfs-discuss@opensolaris.org Subject: RE: partioned cache devices From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 Password: cannot open '/dev/dsk/c25t10d1p2': I/O error muslimwookie@Pyzee:~$ I have two SSDs in the system, I've created an 8gb partition on each drive for use as a mirrored write cache. I also have the remainder of the drive partitioned for use as the read only cache. However, when attempting to add it I get the error above. Sounds like you're probably running into confusion about how to partition the drive. If you create fdisk partitions, they will be accessible as p0, p1, p2, but I think p0 unconditionally refers to the whole drive, so the first partition is p1, and the second is p2. If you create one big solaris fdisk parititon and then slice it via partition where s2 is typically the encompassing slice, and people usually use s1 and s2 and s6 for actual slices, then they will be accessible via s1, s2, s6 Generally speaking, it's unadvisable to split the slog/cache devices anyway. Because: If you're splitting it, evidently you're focusing on the wasted space. Buying an expensive 128G device where you couldn't possibly ever use more than 4G or 8G in the slog. But that's not what you should be focusing on. You should be focusing on the speed (that's why you bought it in the first place.) The slog is write-only, and the cache is a mixture of read/write, where it should be hopefully doing more reads than writes. But regardless of your actual success with the cache device, your cache device will be busy most of the time, and competing against the slog. You have a mirror, you say. You should probably drop both the cache log. Use one whole device for the cache, use one whole device for the log. The only risk you'll run is: Since a slog is write-only (except during mount, typically at boot) it's possible to have a failure mode where you think you're writing to the log, but the first time you go back and read, you discover an error, and discover the device has gone bad. In other words, without ever doing any reads, you might not notice when/if the device goes bad. Fortunately, there's an easy workaround. You could periodically (say, once a month) script the removal of your log device, create a junk pool, write a bunch of data to it, scrub it (thus verifying it was written correctly) and in the absence of any scrub errors, destroy the junk pool and re-add the device as a slog to the main pool. I've never heard of anyone actually being that paranoid, and I've never heard of anyone actually experiencing the aforementioned possible undetected device failure mode. So this is all mostly theoretical. Mirroring the slog device really isn't necessary in the modern age. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
On Mar 16, 2013, at 7:01 PM, Andrew Werchowiecki andrew.werchowie...@xpanse.com.au wrote: It's a home set up, the performance penalty from splitting the cache devices is non-existant, and that work around sounds like some pretty crazy amount of overhead where I could instead just have a mirrored slog. I'm less concerned about wasted space, more concerned about amount of SAS ports I have available. I understand that p0 refers to the whole disk... in the logs I pasted in I'm not attempting to mount p0. I'm trying to work out why I'm getting an error attempting to mount p2, after p1 has successfully mounted. Further, this has been done before on other systems in the same hardware configuration in the exact same fashion, and I've gone over the steps trying to make sure I haven't missed something but can't see a fault. You can have only one Solaris partition at a time. Ian already shared the answer, Create one 100% Solaris partition and then use format to create two slices. -- richard I'm not keen on using Solaris slices because I don't have an understanding of what that does to the pool's OS interoperability. From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) [opensolarisisdeadlongliveopensola...@nedharvey.com] Sent: Friday, 15 March 2013 8:44 PM To: Andrew Werchowiecki; zfs-discuss@opensolaris.org Subject: RE: partioned cache devices From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 Password: cannot open '/dev/dsk/c25t10d1p2': I/O error muslimwookie@Pyzee:~$ I have two SSDs in the system, I've created an 8gb partition on each drive for use as a mirrored write cache. I also have the remainder of the drive partitioned for use as the read only cache. However, when attempting to add it I get the error above. Sounds like you're probably running into confusion about how to partition the drive. If you create fdisk partitions, they will be accessible as p0, p1, p2, but I think p0 unconditionally refers to the whole drive, so the first partition is p1, and the second is p2. If you create one big solaris fdisk parititon and then slice it via partition where s2 is typically the encompassing slice, and people usually use s1 and s2 and s6 for actual slices, then they will be accessible via s1, s2, s6 Generally speaking, it's unadvisable to split the slog/cache devices anyway. Because: If you're splitting it, evidently you're focusing on the wasted space. Buying an expensive 128G device where you couldn't possibly ever use more than 4G or 8G in the slog. But that's not what you should be focusing on. You should be focusing on the speed (that's why you bought it in the first place.) The slog is write-only, and the cache is a mixture of read/write, where it should be hopefully doing more reads than writes. But regardless of your actual success with the cache device, your cache device will be busy most of the time, and competing against the slog. You have a mirror, you say. You should probably drop both the cache log. Use one whole device for the cache, use one whole device for the log. The only risk you'll run is: Since a slog is write-only (except during mount, typically at boot) it's possible to have a failure mode where you think you're writing to the log, but the first time you go back and read, you discover an error, and discover the device has gone bad. In other words, without ever doing any reads, you might not notice when/if the device goes bad. Fortunately, there's an easy workaround. You could periodically (say, once a month) script the removal of your log device, create a junk pool, write a bunch of data to it, scrub it (thus verifying it was written correctly) and in the absence of any scrub errors, destroy the junk pool and re-add the device as a slog to the main pool. I've never heard of anyone actually being that paranoid, and I've never heard of anyone actually experiencing the aforementioned possible undetected device failure mode. So this is all mostly theoretical. Mirroring the slog device really isn't necessary in the modern age. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ZFS and performance consulting http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
Andrew Werchowiecki wrote: Hi all, I'm having some trouble with adding cache drives to a zpool, anyone got any ideas? muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 Password: cannot open '/dev/dsk/c25t10d1p2': I/O error muslimwookie@Pyzee:~$ I have two SSDs in the system, I've created an 8gb partition on each drive for use as a mirrored write cache. I also have the remainder of the drive partitioned for use as the read only cache. However, when attempting to add it I get the error above. Create one 100% Solaris partition and then use format to create two slices. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 Password: cannot open '/dev/dsk/c25t10d1p2': I/O error muslimwookie@Pyzee:~$ I have two SSDs in the system, I've created an 8gb partition on each drive for use as a mirrored write cache. I also have the remainder of the drive partitioned for use as the read only cache. However, when attempting to add it I get the error above. Sounds like you're probably running into confusion about how to partition the drive. If you create fdisk partitions, they will be accessible as p0, p1, p2, but I think p0 unconditionally refers to the whole drive, so the first partition is p1, and the second is p2. If you create one big solaris fdisk parititon and then slice it via partition where s2 is typically the encompassing slice, and people usually use s1 and s2 and s6 for actual slices, then they will be accessible via s1, s2, s6 Generally speaking, it's unadvisable to split the slog/cache devices anyway. Because: If you're splitting it, evidently you're focusing on the wasted space. Buying an expensive 128G device where you couldn't possibly ever use more than 4G or 8G in the slog. But that's not what you should be focusing on. You should be focusing on the speed (that's why you bought it in the first place.) The slog is write-only, and the cache is a mixture of read/write, where it should be hopefully doing more reads than writes. But regardless of your actual success with the cache device, your cache device will be busy most of the time, and competing against the slog. You have a mirror, you say. You should probably drop both the cache log. Use one whole device for the cache, use one whole device for the log. The only risk you'll run is: Since a slog is write-only (except during mount, typically at boot) it's possible to have a failure mode where you think you're writing to the log, but the first time you go back and read, you discover an error, and discover the device has gone bad. In other words, without ever doing any reads, you might not notice when/if the device goes bad. Fortunately, there's an easy workaround. You could periodically (say, once a month) script the removal of your log device, create a junk pool, write a bunch of data to it, scrub it (thus verifying it was written correctly) and in the absence of any scrub errors, destroy the junk pool and re-add the device as a slog to the main pool. I've never heard of anyone actually being that paranoid, and I've never heard of anyone actually experiencing the aforementioned possible undetected device failure mode. So this is all mostly theoretical. Mirroring the slog device really isn't necessary in the modern age. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun X4200 Question...
Thanks for the info. I am planning g the install this weekend, between formula one and other hardware upgrades... fingers crossed it works! On 14 Mar 2013 09:19, Heiko L. h.lehm...@hs-lausitz.de wrote: support for VT, but nothing for AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones may be useful... We use XEN/PV on X4200 for many years without problems. dom0: X4200+openindiana+xvm guests(PV): openindiana,linux/fedora,linux/debian (vmlinuz-2.6.32.28-xenU-32,vmlinuz-2.6.18-xenU64) regards Heiko ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Petabyte pool?
Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
On Fri, Mar 15, 2013 at 06:09:34PM -0700, Marion Hakanson wrote: Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion We've come close: admin@mes-str-imgnx-p1:~$ zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT datapool 978T 298T 680T30% 1.00x ONLINE - syspool278G 104G 174G37% 1.00x ONLINE - Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Using Nexenta but no reason you couldn't do this w/ $whatever. We did triple parity and our vdev membership is set up such that we can lose up to three JBODs and still be functional (one vdev member disk per JBOD). This is with 3TB NL-SAS drives. Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Petabyte pool?
Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. Regards, Kristoffer Sheather Cloud Central Scale Your Data Center In The Cloud Phone: 1300 144 007 | Mobile: +61 414 573 130 | Email: k...@cloudcentral.com.au LinkedIn: | Skype: kristoffer.sheather | Twitter: http://twitter.com/kristofferjon From: Marion Hakanson hakan...@ohsu.edu Sent: Saturday, March 16, 2013 12:12 PM To: z...@lists.illumos.org Subject: [zfs] Petabyte pool? Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion --- illumos-zfs Archives: https://www.listbox.com/member/archive/182191/=now RSS Feed: https://www.listbox.com/member/archive/rss/182191/23629987-2afa167a Modify Your Subscription: https://www.listbox.com/member/?member_id=23629987id_secret=23629987-c48148 a8 Powered by Listbox: http://www.listbox.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] Petabyte pool?
Actually, you could use 3TB drives and with a 6/8 RAIDZ2 stripe achieve 1080 TB usable. You'll also need 8-16 x SAS ports available on each storage head to provide redundant multi-pathed SAS connectivity to the JBOD's, recommend LSI 9207-8E's for those and Intel X520-DA2's for the 10G NIC's. From: Kristoffer Sheather @ CloudCentral kristoffer.sheat...@cloudcentral.com.au Sent: Saturday, March 16, 2013 12:21 PM To: z...@lists.illumos.org Subject: re: [zfs] Petabyte pool? Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. Regards, Kristoffer Sheather Cloud Central Scale Your Data Center In The Cloud Phone: 1300 144 007 | Mobile: +61 414 573 130 | Email: k...@cloudcentral.com.au LinkedIn: | Skype: kristoffer.sheather | Twitter: http://twitter.com/kristofferjon From: Marion Hakanson hakan...@ohsu.edu Sent: Saturday, March 16, 2013 12:12 PM To: z...@lists.illumos.org Subject: [zfs] Petabyte pool? Greetings, Has anyone out there built a 1-petabyte pool? I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Thanks and regards, Marion --- illumos-zfs Archives: https://www.listbox.com/member/archive/182191/=now RSS Feed: https://www.listbox.com/member/archive/rss/182191/23629987-2afa167a Modify Your Subscription: https://www.listbox.com/member/?member_id=23629987id_secret=23629987-c48148 a8 Powered by Listbox: http://www.listbox.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
On Fri, Mar 15, 2013 at 7:09 PM, Marion Hakanson hakan...@ohsu.edu wrote: Has anyone out there built a 1-petabyte pool? I'm not advising against your building/configuring a system yourself, but I suggest taking look at the Petarack: http://www.aberdeeninc.com/abcatg/petarack.htm It shows it's been done with ZFS :-). Jan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
rvandol...@esri.com said: We've come close: admin@mes-str-imgnx-p1:~$ zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT datapool 978T 298T 680T30% 1.00x ONLINE - syspool278G 104G 174G37% 1.00x ONLINE - Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Thanks Ray, We've been looking at those too (we've had good luck with our MD1200's). How many HBA's in the R720? Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
On Fri, Mar 15, 2013 at 06:31:11PM -0700, Marion Hakanson wrote: rvandol...@esri.com said: We've come close: admin@mes-str-imgnx-p1:~$ zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT datapool 978T 298T 680T30% 1.00x ONLINE - syspool278G 104G 174G37% 1.00x ONLINE - Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Thanks Ray, We've been looking at those too (we've had good luck with our MD1200's). How many HBA's in the R720? Thanks and regards, Marion We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]). Ray [1] http://accessories.us.dell.com/sna/productdetail.aspx?c=usl=ens=hiedcs=65sku=a4614101 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
Ray said: Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to a couple of LSI SAS switches. Marion said: How many HBA's in the R720? Ray said: We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]). Sounds similar in approach to the Aberdeen product another sender referred to, with SAS switch layout: http://www.aberdeeninc.com/images/1-up-petarack2.jpg One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout on a 40-slot server with 40x SATA drives in it. But the server uses no SAS expanders, instead using SAS-to-SATA octopus cables to connect the drives directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i). What I found was that the internal pool was significantly faster for both sequential and random I/O than the pool on the external JBOD. My conclusion was that I would not want to exceed ~48 drives on a single 8-port SAS HBA. So I thought that running the I/O of all your hundreds of drives through only two HBA's would be a bottleneck. LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec for that card in an x8 PCIe-2.0 slot. Sure, the newer 9207-8e is rated at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same 8 SAS ports going at 4800MBytes/sec. Yes, I know the disks probably can't go that fast. But in my tests above, the internal 40-disk pool measures 2000MBytes/sec sequential reads and writes, while the external 40-disk JBOD measures at 1500 to 1700 MBytes/sec. Not a lot slower, but significantly slower, so I do think the number of HBA's makes a difference. At the moment, I'm leaning toward piling six, eight, or ten HBA's into a server, preferably one with dual IOH's (thus two PCIe busses), and connecting dual-path JBOD's in that manner. I hadn't looked into SAS switches much, but they do look more reliable than daisy-chaining a bunch of JBOD's together. I just haven't seen how to get more bandwidth through them to a single host. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabyte pool?
On Mar 15, 2013, at 6:09 PM, Marion Hakanson hakan...@ohsu.edu wrote: Greetings, Has anyone out there built a 1-petabyte pool? Yes, I've done quite a few. I've been asked to look into this, and was told low performance is fine, workload is likely to be write-once, read-occasionally, archive storage of gene sequencing data. Probably a single 10Gbit NIC for connectivity is sufficient. We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis, using 4TB nearline SAS drives, giving over 100TB usable space (raidz3). Back-of-the-envelope might suggest stacking up eight to ten of those, depending if you want a raw marketing petabyte, or a proper power-of-two usable petabyte. Yes. NB, for the PHB, using N^2 is found 2B less effective than N^10. I get a little nervous at the thought of hooking all that up to a single server, and am a little vague on how much RAM would be advisable, other than as much as will fit (:-). Then again, I've been waiting for something like pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a single global namespace, without any sign of that happening anytime soon. NFS v4 or DFS (or even clever sysadmin + automount) offers single namespace without needing the complexity of NFSv4.1, lustre, glusterfs, etc. So, has anyone done this? Or come close to it? Thoughts, even if you haven't done it yourself? Don't forget about backups :-) -- richard -- richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun X4200 Question...
support for VT, but nothing for AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones may be useful... We use XEN/PV on X4200 for many years without problems. dom0: X4200+openindiana+xvm guests(PV): openindiana,linux/fedora,linux/debian (vmlinuz-2.6.32.28-xenU-32,vmlinuz-2.6.18-xenU64) regards Heiko ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun X4200 Question...
On 2013-03-11 21:50, Bob Friesenhahn wrote: On Mon, 11 Mar 2013, Tiernan OToole wrote: I know this might be the wrong place to ask, but hopefully someone can point me in the right direction... I got my hands on a Sun x4200. Its the original one, not the M2, and has 2 single core Opterons, 4Gb RAM and 4 73Gb SAS Disks... But, I dont know what to install on it... I was thinking of SmartOS, but the site mentions Intel support for VT, but nothing for AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones may be useful... OpenIndiana or OmniOS seem like the most likely candidates. You can run VirtualBox on OpenIndiana and it should be able to work without VT extensions. Also note that without the extensions VirtualBox has some quirks. Most notably, lack of acceleration and support for virtual SMP. But unlike some other virtualizers, it should work (does work for us on a Thumper also with pre-VTx Opteron CPUs). However, recently the VM virtual hardware clocks became way slow. I am at loss so far, the forum was moderately helpful - probably the load on the host and induced latencies have their role. But the problem does happen on more modern hardware too, so VTx (lack of) shouldn't be our reason... //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun X4200 Question...
On Mar 14, 2013, at 5:55 PM, Jim Klimov jimkli...@cos.ru wrote: However, recently the VM virtual hardware clocks became way slow. Does NTP help correct the guest's clock? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun X4200 Question...
On 2013-03-15 01:58, Gary Driggs wrote: On Mar 14, 2013, at 5:55 PM, Jim Klimov jimkli...@cos.ru wrote: However, recently the VM virtual hardware clocks became way slow. Does NTP help correct the guest's clock? Unfortunately no, neither guest NTP, ntpdate or rdate in crontabs, nor VirtualBox timesync settings, alone or even combined for test (though known to conflict) - nothing has definitely helped so far. We also have some setups on rather not-loaded hardware where after a few days of uptime the clock stalls to the point that it has a groundhog day - rotating over the same 2-3 second range for hours, until the VM is powered off and booted. Conversely, we also have dozens of VMs (and a few hosts) where no such problems occur. Weird stuff... //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] partioned cache devices
Hi all, I'm having some trouble with adding cache drives to a zpool, anyone got any ideas? muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 Password: cannot open '/dev/dsk/c25t10d1p2': I/O error muslimwookie@Pyzee:~$ I have two SSDs in the system, I've created an 8gb partition on each drive for use as a mirrored write cache. I also have the remainder of the drive partitioned for use as the read only cache. However, when attempting to add it I get the error above. Here's a zpool status: pool: aggr0 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Feb 21 21:13:45 2013 1.13T scanned out of 20.0T at 106M/s, 51h52m to go 74.2G resilvered, 5.65% done config: NAME STATE READ WRITE CKSUM aggr0DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c7t5000C50035CA68EDd0ONLINE 0 0 0 c7t5000C5003679D3E2d0ONLINE 0 0 0 c7t50014EE2B16BC08Bd0ONLINE 0 0 0 c7t50014EE2B174216Dd0ONLINE 0 0 0 c7t50014EE2B174366Bd0ONLINE 0 0 0 c7t50014EE25C1E7646d0ONLINE 0 0 0 c7t50014EE25C17A62Cd0ONLINE 0 0 0 c7t50014EE25C17720Ed0ONLINE 0 0 0 c7t50014EE206C2AFD1d0ONLINE 0 0 0 c7t50014EE206C8E09Fd0ONLINE 0 0 0 c7t50014EE602DFAACAd0ONLINE 0 0 0 c7t50014EE602DFE701d0ONLINE 0 0 0 c7t50014EE20677C1C1d0ONLINE 0 0 0 replacing-13 UNAVAIL 0 0 0 c7t50014EE6031198C1d0 UNAVAIL 0 0 0 cannot open c7t50014EE0AE2AB006d0 ONLINE 0 0 0 (resilvering) c7t50014EE65835480Dd0ONLINE 0 0 0 logs mirror-1 ONLINE 0 0 0 c25t10d1p1 ONLINE 0 0 0 c25t9d1p1ONLINE 0 0 0 errors: No known data errors As you can see, I've successfully added the 8gb partitions in a write caches. Interestingly, when I do a zpool iostat -v it shows the total as 111gb: capacity operationsbandwidth pool alloc free read write read write --- - - - - - - aggr020.0T 7.27T 1.33K139 81.7M 4.19M raidz2 20.0T 7.27T 1.33K115 81.7M 2.70M c7t5000C50035CA68EDd0- -566 9 6.91M 241K c7t5000C5003679D3E2d0- -493 8 6.97M 242K c7t50014EE2B16BC08Bd0- -544 9 7.02M 239K c7t50014EE2B174216Dd0- -525 9 6.94M 241K c7t50014EE2B174366Bd0- -540 9 6.95M 241K c7t50014EE25C1E7646d0- -549 9 7.02M 239K c7t50014EE25C17A62Cd0- -534 9 6.93M 241K c7t50014EE25C17720Ed0- -542 9 6.95M 241K c7t50014EE206C2AFD1d0- -549 9 7.02M 239K c7t50014EE206C8E09Fd0- -526 10 6.94M 241K c7t50014EE602DFAACAd0- -576 10 6.91M 241K c7t50014EE602DFE701d0- -591 10 7.00M 239K c7t50014EE20677C1C1d0- -530 10 6.95M 241K replacing- - 0922 0 7.11M c7t50014EE6031198C1d0 - - 0 0 0 0 c7t50014EE0AE2AB006d0 - - 0622 2 7.10M c7t50014EE65835480Dd0- -595 10 6.98M 239K logs - - - - - - mirror 740K 111G 0 43 0 2.75M c25t10d1p1 - - 0 43 3 2.75M c25t9d1p1- - 0 43 3 2.75M --- - - - - - - rpool7.32G 12.6G 2 4 41.9K 43.2K c4t0d0s0 7.32G 12.6G 2 4 41.9K 43.2K --- - - - - - - Something funky is going on here... Wooks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Sun X4200 Question...
I know this might be the wrong place to ask, but hopefully someone can point me in the right direction... I got my hands on a Sun x4200. Its the original one, not the M2, and has 2 single core Opterons, 4Gb RAM and 4 73Gb SAS Disks... But, I dont know what to install on it... I was thinking of SmartOS, but the site mentions Intel support for VT, but nothing for AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones may be useful... Any advice? Thanks! -- Tiernan O'Toole blog.lotas-smartman.net www.geekphotographer.com www.tiernanotoole.ie ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun X4200 Question...
On Mon, 11 Mar 2013, Tiernan OToole wrote: I know this might be the wrong place to ask, but hopefully someone can point me in the right direction... I got my hands on a Sun x4200. Its the original one, not the M2, and has 2 single core Opterons, 4Gb RAM and 4 73Gb SAS Disks... But, I dont know what to install on it... I was thinking of SmartOS, but the site mentions Intel support for VT, but nothing for AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones may be useful... OpenIndiana or OmniOS seem like the most likely candidates. You can run VirtualBox on OpenIndiana and it should be able to work without VT extensions. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun X4200 Question...
to tell you the truth, i dont really need the virtualization stuff... Zones sounds interesting, since it seems to be ligher weight than Xen or anything like that... On Mon, Mar 11, 2013 at 8:50 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Mon, 11 Mar 2013, Tiernan OToole wrote: I know this might be the wrong place to ask, but hopefully someone can point me in the right direction... I got my hands on a Sun x4200. Its the original one, not the M2, and has 2 single core Opterons, 4Gb RAM and 4 73Gb SAS Disks... But, I dont know what to install on it... I was thinking of SmartOS, but the site mentions Intel support for VT, but nothing for AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones may be useful... OpenIndiana or OmniOS seem like the most likely candidates. You can run VirtualBox on OpenIndiana and it should be able to work without VT extensions. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/** users/bfriesen/ http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ -- Tiernan O'Toole blog.lotas-smartman.net www.geekphotographer.com www.tiernanotoole.ie ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Huge Numbers of Illegal Requests
On Tue, 5 Mar 2013, Ed Shipe wrote: On 2 different OpenIndiana 151a7 systems, Im showing a huge number of Illegal Requests. There are no other apparent issues, performance is fine, etc,etc.Everything works great - what are these illegal requests? My Google-Foo is failing me... My system used to exhibit this problem so I opened Illumos issue 2998 (https://www.illumos.org/issues/2998). The weird thing is that the problem went away and has not returned. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
We do the same for all of our legacy operating system backups. Take a snapshot then do an rsync and an excellent way of maintaining incremental backups for those. Magic rsync options used: -a --inplace --no-whole-file --delete-excluded This causes rsync to overwrite the file blocks in place rather than writing to a new temporary file first. As a result, zfs COW produces primitive deduplication of at least the unchanged blocks (by writing nothing) while writing new COW blocks for the changed blocks. If I understand your use case correctly (the application overwrites some blocks with the same exact contents), ZFS will ignore these no- I think he meant to rely on rsync here to do in-place updates of files and only for changed blocks with the above parameters (by using rsync's own delta mechanism). So if you have a file a and only one block changed rsync will overwrite on destination only that single block. op writes only on recent Open ZFS (illumos / FreeBSD / Linux) builds with checksum=sha256 and compression!=off. AFAIK, Solaris ZFS will COW the blocks even if their content is identical to what's already there, causing the snapshots to diverge. See https://www.illumos.org/issues/3236 for details. This is interesting. I didn't know about it. Is there an option similar to verify=on in dedup or does it just assume that checksum is your data? -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Mon, 4 Mar 2013, Matthew Ahrens wrote: Magic rsync options used: -a --inplace --no-whole-file --delete-excluded This causes rsync to overwrite the file blocks in place rather than writing to a new temporary file first. As a result, zfs COW produces primitive deduplication of at least the unchanged blocks (by writing nothing) while writing new COW blocks for the changed blocks. If I understand your use case correctly (the application overwrites some blocks with the same exact contents), ZFS will ignore these no-op writes only on recent Open ZFS (illumos / FreeBSD / Linux) builds with checksum=sha256 and compression!=off. AFAIK, Solaris ZFS will COW the blocks even if their content is identical to what's already there, causing the snapshots to diverge. With these rsync options, rsync will only overwrite a block if the contents of the block has changed. Rsync's notion of a block is different than zfs so there is not a perfect overlap. Rsync does need to read files on the destination filesystem to see if they have changed. If the system has sufficient RAM (and/or L2ARC) then files may still be cached from the previous day's run. In most cases only a small subset of the total files are updated (at least on my systems) so the caching requirements are small. Files updated on one day are more likely to be the ones updated on subsequent days. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Tue, March 5, 2013 10:02, Bob Friesenhahn wrote: Rsync does need to read files on the destination filesystem to see if they have changed. If the system has sufficient RAM (and/or L2ARC) then files may still be cached from the previous day's run. In most cases only a small subset of the total files are updated (at least on my systems) so the caching requirements are small. Files updated on one day are more likely to be the ones updated on subsequent days. It's also possible to reduce the amount that rsync has to walk the entire file tree. Most folks simply do a rsync --options /my/source/ /the/dest/, but if you use zfs diff, and parse/feed the output of that to rsync, then the amount of thrashing can probably be minimized. Especially useful for file hierarchies that very many individual files, so you don't have to stat() every single one. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 3/5/2013 9:40 AM, David Magda wrote: On Tue, March 5, 2013 10:02, Bob Friesenhahn wrote: Rsync does need to read files on the destination filesystem to see if they have changed. If the system has sufficient RAM (and/or L2ARC) then files may still be cached from the previous day's run. In most cases only a small subset of the total files are updated (at least on my systems) so the caching requirements are small. Files updated on one day are more likely to be the ones updated on subsequent days. It's also possible to reduce the amount that rsync has to walk the entire file tree. Most folks simply do a rsync --options /my/source/ /the/dest/, but if you use zfs diff, and parse/feed the output of that to rsync, then the amount of thrashing can probably be minimized. Especially useful for file hierarchies that very many individual files, so you don't have to stat() every single one. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss David, Your idea to use zfs diff to limit the need to stat the entire filesystem tree intrigues me. My current rsync backups are normally limited by this very factor. It takes longer to walk the filesystem tree than it does to transfer the new data. Would you be willing to provide an example of what you mean when you say parse/feed the ouput of zfs diff to rsync? Russ Poyner ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Tue, 5 Mar 2013, David Magda wrote: It's also possible to reduce the amount that rsync has to walk the entire file tree. Most folks simply do a rsync --options /my/source/ /the/dest/, but if you use zfs diff, and parse/feed the output of that to rsync, then the amount of thrashing can probably be minimized. Especially useful for file hierarchies that very many individual files, so you don't have to stat() every single one. Zfs diff only works for zfs filesystems. If one is using zfs filesystems then rsync may not be the best option. In the real world, data may be sourced from many types of systems and filesystems. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 3/5/2013 10:27 AM, Bob Friesenhahn wrote: On Tue, 5 Mar 2013, David Magda wrote: It's also possible to reduce the amount that rsync has to walk the entire file tree. Most folks simply do a rsync --options /my/source/ /the/dest/, but if you use zfs diff, and parse/feed the output of that to rsync, then the amount of thrashing can probably be minimized. Especially useful for file hierarchies that very many individual files, so you don't have to stat() every single one. Zfs diff only works for zfs filesystems. If one is using zfs filesystems then rsync may not be the best option. In the real world, data may be sourced from many types of systems and filesystems. Bob Bob, Good point. Clearly this wouldn't work for my current linux fileserver. I'm building a replacement that will run FreeBSD 9.1 with a zfs storage pool. My backups are to a thumper running solaris 10 and zfs in another department. I have an arm's-length collaboration with the department that runs the thumper, which likely precludes a direct zfs send. Rsync has allowed us to transfer data without getting too deep into each others' system administration. I run an rsync daemon with read only access to my filesystem that accepts connections from the thumper. They serve the backups to me via a read-only nfs export. The only problem has been the iops load generated by my users' millions of small files. That's why the zfs diff idea excited me, but perhaps I'm missing some simpler approach. Russ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Tue, March 5, 2013 11:17, Russ Poyner wrote: Your idea to use zfs diff to limit the need to stat the entire filesystem tree intrigues me. My current rsync backups are normally limited by this very factor. It takes longer to walk the filesystem tree than it does to transfer the new data. Would you be willing to provide an example of what you mean when you say parse/feed the ouput of zfs diff to rsync? Don't have anything readily available, or a ZFS system handy to hack something up. The output of zfs diff is roughly: M /myfiles/ M /myfiles/link_to_me (+1) R /myfiles/rename_me - /myfiles/renamed - /myfiles/delete_me + /myfiles/new_file Take the second column and use that as the list of file to check. Solaris' zfs(1M) has an -F option which would output something like: M / /myfiles/ M F /myfiles/link_to_me (+1) R /myfiles/rename_me - /myfiles/renamed - F /myfiles/delete_me + F /myfiles/new_file + | /myfiles/new_pipe So the second column now has a type, and the path is pushed over to the third column. This way you can simply choose file (F) and tell rsync to use check those. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Tue, Feb 26, 2013 at 7:42 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Wed, 27 Feb 2013, Ian Collins wrote: I am finding that rsync with the right options (to directly block-overwrite) plus zfs snapshots is providing me with pretty amazing deduplication for backups without even enabling deduplication in zfs. Now backup storage goes a very long way. We do the same for all of our legacy operating system backups. Take a snapshot then do an rsync and an excellent way of maintaining incremental backups for those. Magic rsync options used: -a --inplace --no-whole-file --delete-excluded This causes rsync to overwrite the file blocks in place rather than writing to a new temporary file first. As a result, zfs COW produces primitive deduplication of at least the unchanged blocks (by writing nothing) while writing new COW blocks for the changed blocks. If I understand your use case correctly (the application overwrites some blocks with the same exact contents), ZFS will ignore these no-op writes only on recent Open ZFS (illumos / FreeBSD / Linux) builds with checksum=sha256 and compression!=off. AFAIK, Solaris ZFS will COW the blocks even if their content is identical to what's already there, causing the snapshots to diverge. See https://www.illumos.org/issues/3236 for details. commit 80901aea8e78a2c20751f61f01bebd1d5b5c2ba5 Author: George Wilson george.wil...@delphix.com Date: Tue Nov 13 14:55:48 2012 -0800 3236 zio nop-write --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Huge Numbers of Illegal Requests
On 2 different OpenIndiana 151a7 systems, Im showing a huge number of Illegal Requests. There are no other apparent issues, performance is fine, etc,etc. Everything works great - what are these illegal requests? My Google-Foo is failing me... Thanks, -ed root@NAPP1:~# iostat -Ensr c6t0d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: SanDisk ,Product: Extreme ,Revision: 0001 ,Serial No: Size: 16.01GB 16013942784 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 333 ,Predictive Failure Analysis: 0 c4t0d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: SanDisk SDSSDX24 ,Revision: R211 ,Serial No: 121562402168 Size: 240.06GB 240057409536 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 992096 ,Predictive Failure Analysis: 0 c4t1d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: SanDisk SDSSDX24 ,Revision: R211 ,Serial No: 121562401118 Size: 240.06GB 240057409536 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 992064 ,Predictive Failure Analysis: 0 c4t2d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: SanDisk SDSSDX24 ,Revision: R211 ,Serial No: 121562401215 Size: 240.06GB 240057409536 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 992063 ,Predictive Failure Analysis: 0 c4t3d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: SanDisk SDSSDX24 ,Revision: R211 ,Serial No: 121562401014 Size: 240.06GB 240057409536 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 992063 ,Predictive Failure Analysis: 0 c4t5d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: INTEL SSDSC2CT12 ,Revision: 300i ,Serial No: CVMP219200MZ120 Size: 120.03GB 120034123776 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 1983773 ,Predictive Failure Analysis: 0 c3t0d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: ST2000DM001-9YN1 ,Revision: CC4B ,Serial No: S240F3KN Size: 2000.40GB 2000398934016 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 992072 ,Predictive Failure Analysis: 0 c3t1d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: ST2000DM001-9YN1 ,Revision: CC4B ,Serial No: S240F2TN Size: 2000.40GB 2000398934016 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 992031 ,Predictive Failure Analysis: 0 c3t2d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: ST2000DM001-9YN1 ,Revision: CC4B ,Serial No: Z1E0K3C9 Size: 2000.40GB 2000398934016 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 992019 ,Predictive Failure Analysis: 0 c3t3d0 ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0 Vendor: ATA ,Product: ST2000DM001-9YN1 ,Revision: CC4B ,Serial No: W1E0FL39 Size: 2000.40GB 2000398934016 bytes ,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0 Illegal Request: 992016 ,Predictive Failure Analysis: 0 root@NAPP1:~# -- Ed Shipe Candelabra Computing Inc e...@candelabracomputing.com Mobile: 410-929-2597 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SVM ZFS
On 02/26/13 20:30, Morris Hooten wrote: Besides copying data from /dev/md/dsk/x volume manager filesystems to new zfs filesystems does anyone know of any zfs conversion tools to make the conversion/migration from svm to zfs easier? With Solaris 11 you can use shadow migration, it is really a VFS layer feature but it is integrated into the ZFS CLI tools for easy of use # zfs create -o shadow=file:///path/to/old mypool/new The new filesystem will appear to instantly have all the data, and it will be copied over as it is access as well as shadowd pulling it over in advance. You can use shadowstat(1M) to show progress. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
How is the quality of the ZFS Linux port today? Is it comparable to Illumos or at least FreeBSD ? Can I trust production data to it ? On Wed, Feb 27, 2013 at 5:22 AM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Tue, 26 Feb 2013, Gary Driggs wrote: On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote: I'd also recommend that you go and subscribe to z...@lists.illumos.org, since this list is going to get shut down by Oracle next month. Whose description still reads, everything ZFS running on illumos-based distributions. Even FreeBSD's zfs is now based on zfs from Illumos. FreeBSD and Linux zfs developers contribute fixes back to zfs in Illumos. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/** users/bfriesen/ http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 02/27/2013 12:32 PM, Ahmed Kamal wrote: How is the quality of the ZFS Linux port today? Is it comparable to Illumos or at least FreeBSD ? Can I trust production data to it ? Can't speak from personal experience, but a colleague of mine has been PPA builds on Ubuntu and has had, well, less than stellar experience. It shows promise, but I'm not sure it's there yet. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
I've been using it since rc13. It's been stable for me as long as you don't get into things like zvols and such... -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Sašo Kiselkov Sent: Wednesday, February 27, 2013 6:37 AM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS Distro Advice On 02/27/2013 12:32 PM, Ahmed Kamal wrote: How is the quality of the ZFS Linux port today? Is it comparable to Illumos or at least FreeBSD ? Can I trust production data to it ? Can't speak from personal experience, but a colleague of mine has been PPA builds on Ubuntu and has had, well, less than stellar experience. It shows promise, but I'm not sure it's there yet. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Wed, 27 Feb 2013, Ian Collins wrote: Magic rsync options used: -a --inplace --no-whole-file --delete-excluded This causes rsync to overwrite the file blocks in place rather than writing to a new temporary file first. As a result, zfs COW produces primitive deduplication of at least the unchanged blocks (by writing nothing) while writing new COW blocks for the changed blocks. Do these options impact performance or reduce the incremental stream sizes? I don't see any adverse impact on performance and incremental stream size is quite considerably reduced. The main risk is that if the disk fills up you may end up with a corrupted file rather than just an rsync error. However, the snapshots help because an earlier version of the file is likely available. I just use -a --delete and the snapshots don't take up much space (compared with the incremental stream sizes). That is what I used to do before I learned better. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Wed, Feb 27, 2013 at 2:57 AM, Dan Swartzendruber dswa...@druber.comwrote: I've been using it since rc13. It's been stable for me as long as you don't get into things like zvols and such... Then it definitely isn't at the level of FreeBSD, and personally I would not consider that production ready. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 2/27/2013 2:05 PM, Tim Cook wrote: On Wed, Feb 27, 2013 at 2:57 AM, Dan Swartzendruber dswa...@druber.com mailto:dswa...@druber.com wrote: I've been using it since rc13. It's been stable for me as long as you don't get into things like zvols and such... Then it definitely isn't at the level of FreeBSD, and personally I would not consider that production ready. Everyone has to make their own risk assessment. Keep in mind, it is described as a release candidate. I understand zvols are an important feature, but I can do without them, so I am... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SVM ZFS
Hi Darren. you're right! With solaris 11 and shadow migration feature it's fantastic. Not sure which Solaria vers we are talking about here. Alfredo On Wed, Feb 27, 2013 at 10:22 PM, Darren J Moffat darr...@opensolaris.orgwrote: On 02/26/13 20:30, Morris Hooten wrote: Besides copying data from /dev/md/dsk/x volume manager filesystems to new zfs filesystems does anyone know of any zfs conversion tools to make the conversion/migration from svm to zfs easier? With Solaris 11 you can use shadow migration, it is really a VFS layer feature but it is integrated into the ZFS CLI tools for easy of use # zfs create -o shadow=file:///path/to/old mypool/new The new filesystem will appear to instantly have all the data, and it will be copied over as it is access as well as shadowd pulling it over in advance. You can use shadowstat(1M) to show progress. -- Darren J Moffat __**_ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- *Alfredo* ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Thanks all! I will check out FreeNAS and see what it can do... I will also check my RAID Card and see if it can work with JBOD... fingers crossed... The machine has a couple internal SATA ports (think there are 2, could be 4) so i was thinking of using those for boot disks and SSDs later... As a follow up question: Data Deduplication: The machine, to start, will have about 5Gb RAM. I read somewhere that 20TB storage would require about 8GB RAM, depending on block size... Since i dont know block sizes, yet (i store a mix of VMs, TV Shows, Movies and backups on the NAS) I am not sure how much memory i will need (my estimate is 10TB RAW (8TB usable?) in a ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont have enough memory now, can i enable DeDupe at a later stage when i add memory? Also, if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible? Assuming the drives are just JBOD drives (to be confirmed) could they just get imported? Thanks. On Mon, Feb 25, 2013 at 6:11 PM, Tim Cook t...@cook.ms wrote: On Mon, Feb 25, 2013 at 7:57 AM, Volker A. Brandt v...@bb-c.de wrote: Tim Cook writes: I need something that will allow me to share files over SMB (3 if possible), NFS, AFP (for Time Machine) and iSCSI. Ideally, i would like something i can manage easily and something that works with the Dell... All of them should provide the basic functionality you're looking for. None of them will provide SMB3 (at all) or AFP (without a third party package). FreeNAS has AFP built-in, including a Time Machine discovery method. The latest FreeNAS is still based on Samba 3.x, but they are aware of 4.x and will probably integrate it at some point in the future. Then you should have SMB3. I don't know how far along they are... Best regards -- Volker FreeNAS comes with a package pre-installed to add AFP support. There is no native AFP support in FreeBSD and by association FreeNAS. --Tim -- Tiernan O'Toole blog.lotas-smartman.net www.geekphotographer.com www.tiernanotoole.ie ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 02/26/2013 09:33 AM, Tiernan OToole wrote: As a follow up question: Data Deduplication: The machine, to start, will have about 5Gb RAM. I read somewhere that 20TB storage would require about 8GB RAM, depending on block size... The typical wisdom is that 1TB of dedup'ed data = 1GB of RAM. 5GB of RAM seems too small for a 20TB pool of dedup'ed data. Unless you know what you're doing, I'd go with just compression and let dedup be - compression has known performance and doesn't suffer with scaling. If i dont have enough memory now, can i enable DeDupe at a later stage when i add memory? Yes. Also, if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible? Assuming the drives are just JBOD drives (to be confirmed) could they just get imported? Yes, that's the whole point of open storage. I'd also recommend that you go and subscribe to z...@lists.illumos.org, since this list is going to get shut down by Oracle next month. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Mon, Feb 25, 2013 at 10:33 PM, Tiernan OToole lsmart...@gmail.comwrote: Thanks all! I will check out FreeNAS and see what it can do... I will also check my RAID Card and see if it can work with JBOD... fingers crossed... The machine has a couple internal SATA ports (think there are 2, could be 4) so i was thinking of using those for boot disks and SSDs later... As a follow up question: Data Deduplication: The machine, to start, will have about 5Gb RAM. I read somewhere that 20TB storage would require about 8GB RAM, depending on block size... Since i dont know block sizes, yet (i store a mix of VMs, TV Shows, Movies and backups on the NAS) I am not sure how much memory i will need (my estimate is 10TB RAW (8TB usable?) in a ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont have enough memory now, can i enable DeDupe at a later stage when i add memory? Also, if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible? Assuming the drives are just JBOD drives (to be confirmed) could they just get imported? Thanks. Yes, you can move between FreeBSD and Illumos based distros as long as you are at a compatible zpool version (which they currently are). I'd avoid deduplication unless you absolutely need it... it's still a bit of a kludge. Stick to compression and your world will be a much happier place. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Thanks again lads. I will take all that info into advice, and will join that new group also! Thanks again! --Tiernan On Tue, Feb 26, 2013 at 8:44 AM, Tim Cook t...@cook.ms wrote: On Mon, Feb 25, 2013 at 10:33 PM, Tiernan OToole lsmart...@gmail.comwrote: Thanks all! I will check out FreeNAS and see what it can do... I will also check my RAID Card and see if it can work with JBOD... fingers crossed... The machine has a couple internal SATA ports (think there are 2, could be 4) so i was thinking of using those for boot disks and SSDs later... As a follow up question: Data Deduplication: The machine, to start, will have about 5Gb RAM. I read somewhere that 20TB storage would require about 8GB RAM, depending on block size... Since i dont know block sizes, yet (i store a mix of VMs, TV Shows, Movies and backups on the NAS) I am not sure how much memory i will need (my estimate is 10TB RAW (8TB usable?) in a ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont have enough memory now, can i enable DeDupe at a later stage when i add memory? Also, if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible? Assuming the drives are just JBOD drives (to be confirmed) could they just get imported? Thanks. Yes, you can move between FreeBSD and Illumos based distros as long as you are at a compatible zpool version (which they currently are). I'd avoid deduplication unless you absolutely need it... it's still a bit of a kludge. Stick to compression and your world will be a much happier place. --Tim -- Tiernan O'Toole blog.lotas-smartman.net www.geekphotographer.com www.tiernanotoole.ie ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Solaris 11.1 (free for non-prod use). From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Tiernan OToole Sent: 25 February 2013 14:58 To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] ZFS Distro Advice Good morning all. My home NAS died over the weekend, and it leaves me with a lot of spare drives (5 2Tb and 3 1Tb disks). I have a Dell Poweredge 2900 Server sitting in the house, which has not been doing much over the last while (bought it a few years back with the intent of using it as a storage box, since it has 8 Hot Swap drive bays) and i am now looking at building the NAS using ZFS... But, now i am confused as to what OS to use... OpenIndiana? Nexenta? FreeNAS/FreeBSD? I need something that will allow me to share files over SMB (3 if possible), NFS, AFP (for Time Machine) and iSCSI. Ideally, i would like something i can manage easily and something that works with the Dell... Any recommendations? Any comparisons to each? Thanks. -- Tiernan O'Toole blog.lotas-smartman.net www.geekphotographer.com www.tiernanotoole.ie ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs sata mirror slower than single disk
for what is worth.. I had the same problem and found the answer here - http://forums.freebsd.org/showthread.php?t=27207 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote: I'd also recommend that you go and subscribe to z...@lists.illumos.org, since this list is going to get shut down by Oracle next month. Whose description still reads, everything ZFS running on illumos-based distributions. -Gary ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 02/26/2013 03:51 PM, Gary Driggs wrote: On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote: I'd also recommend that you go and subscribe to z...@lists.illumos.org, since this list is going to get shut down by Oracle next month. Whose description still reads, everything ZFS running on illumos-based distributions. We've never dismissed any topic or issue as not our problem. All sensible ZFS-related discussion is welcome and taken seriously. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Tue, Feb 26, 2013 at 06:51:08AM -0800, Gary Driggs wrote: On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote: I'd also recommend that you go and subscribe to z...@lists.illumos.org, since I can't seem to find this list. Do you have an URL for that? Mailman, hopefully? this list is going to get shut down by Oracle next month. Whose description still reads, everything ZFS running on illumos-based distributions. -Gary ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 02/26/2013 05:57 PM, Eugen Leitl wrote: On Tue, Feb 26, 2013 at 06:51:08AM -0800, Gary Driggs wrote: On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote: I'd also recommend that you go and subscribe to z...@lists.illumos.org, since I can't seem to find this list. Do you have an URL for that? Mailman, hopefully? http://wiki.illumos.org/display/illumos/illumos+Mailing+Lists -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Tue, Feb 26, 2013 at 06:01:39PM +0100, Sašo Kiselkov wrote: On 02/26/2013 05:57 PM, Eugen Leitl wrote: On Tue, Feb 26, 2013 at 06:51:08AM -0800, Gary Driggs wrote: On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote: I'd also recommend that you go and subscribe to z...@lists.illumos.org, since I can't seem to find this list. Do you have an URL for that? Mailman, hopefully? http://wiki.illumos.org/display/illumos/illumos+Mailing+Lists Oh, it's the illumos-zfs one. Had me confused. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs sata mirror slower than single disk
Be careful when testing ZFS with ozone, I ran a bunch of stats many years ago that produced results that did not pass a basic sanity check. There was *something* about the ozone test data that ZFS either did not like or liked very much, depending on the specific test. I eventually wrote my own very crude tool to test exactly what our workload was and started getting results that matched the reality we saw. On Jul 17, 2012, at 4:18 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Tue, 17 Jul 2012, Michael Hase wrote: To work around these caching effects just use a file 2 times the size of ram, iostat then shows the numbers really coming from disk. I always test like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but quite impressive ;-) Ok, the iozone benchmark finally completed. The results do suggest that reading from mirrors substantially improves the throughput. This is interesting since the results differ (better than) from my 'virgin mount' test approach: Command line used: iozone -a -i 0 -i 1 -y 64 -q 512 -n 8G -g 256G KB reclen write rewritereadreread 8388608 64 572933 1008668 6945355 7509762 8388608 128 2753805 2388803 6482464 7041942 8388608 256 2508358 2331419 2969764 3045430 8388608 512 2407497 2131829 3021579 3086763 16777216 64 671365 879080 6323844 6608806 16777216 128 1279401 2286287 6409733 6739226 16777216 256 2382223 2211097 2957624 3021704 16777216 512 2237742 2179611 3048039 3085978 33554432 64 933712 699966 6418428 6604694 33554432 128 459896 431640 6443848 6546043 33554432 256 90 430989 2997615 3026246 33554432 512 427158 430891 3042620 3100287 67108864 64 426720 427167 6628750 6738623 67108864 128 419328 422581 153 6743711 67108864 256 419441 419129 3044352 3056615 67108864 512 431053 417203 3090652 3112296 134217728 64 417668 55434 759351 760994 134217728 128 409383 400433 759161 765120 134217728 256 408193 405868 763892 766184 134217728 512 408114 403473 761683 766615 268435456 64 418910 55239 768042 768498 268435456 128 408990 399732 763279 766882 268435456 256 413919 399386 760800 764468 268435456 512 410246 403019 766627 768739 Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Paul Kraus Deputy Technical Director, LoneStarCon 3 Sound Coordinator, Schenectady Light Opera Company ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Feb 26, 2013, at 12:33 AM, Tiernan OToole lsmart...@gmail.com wrote: Thanks all! I will check out FreeNAS and see what it can do... I will also check my RAID Card and see if it can work with JBOD... fingers crossed... The machine has a couple internal SATA ports (think there are 2, could be 4) so i was thinking of using those for boot disks and SSDs later... As a follow up question: Data Deduplication: The machine, to start, will have about 5Gb RAM. I read somewhere that 20TB storage would require about 8GB RAM, depending on block size... Since i dont know block sizes, yet (i store a mix of VMs, TV Shows, Movies and backups on the NAS) Consider using different policies for different data. For traditional file systems, you had relatively few policy options: readonly, nosuid, quota, etc. With ZFS, dedup and compression are also policy options. In your case, dedup for your media is not likely to be a good policy, but dedup for your backups could be a win (unless you're using something that already doesn't backup duplicate data -- eg most backup utilities). A way to approach this is to think of your directory structure and create file systems to match the policies. For example: /home/richard = compressed (default top-level, since properties are inherited) /home/richard/media = compressed /home/richard/backup = compressed + dedup -- richard I am not sure how much memory i will need (my estimate is 10TB RAW (8TB usable?) in a ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont have enough memory now, can i enable DeDupe at a later stage when i add memory? Also, if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible? Assuming the drives are just JBOD drives (to be confirmed) could they just get imported? Thanks. On Mon, Feb 25, 2013 at 6:11 PM, Tim Cook t...@cook.ms wrote: On Mon, Feb 25, 2013 at 7:57 AM, Volker A. Brandt v...@bb-c.de wrote: Tim Cook writes: I need something that will allow me to share files over SMB (3 if possible), NFS, AFP (for Time Machine) and iSCSI. Ideally, i would like something i can manage easily and something that works with the Dell... All of them should provide the basic functionality you're looking for. None of them will provide SMB3 (at all) or AFP (without a third party package). FreeNAS has AFP built-in, including a Time Machine discovery method. The latest FreeNAS is still based on Samba 3.x, but they are aware of 4.x and will probably integrate it at some point in the future. Then you should have SMB3. I don't know how far along they are... Best regards -- Volker FreeNAS comes with a package pre-installed to add AFP support. There is no native AFP support in FreeBSD and by association FreeNAS. --Tim -- Tiernan O'Toole blog.lotas-smartman.net www.geekphotographer.com www.tiernanotoole.ie ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SVM ZFS
Besides copying data from /dev/md/dsk/x volume manager filesystems to new zfs filesystems does anyone know of any zfs conversion tools to make the conversion/migration from svm to zfs easier? Thanks Morris Hooten Unix SME Integrated Technology Delivery mhoo...@us.ibm.com Office: 720-342-5614___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Robert Milkowski wrote: Solaris 11.1 (free for non-prod use). But a ticking bomb if you use a cache device. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Robert Milkowski wrote: Solaris 11.1 (free for non-prod use). But a ticking bomb if you use a cache device. It's been fixed in SRU (although this is only for customers with a support contract - still, will be in 11.2 as well). Then, I'm sure there are other bugs which are fixed in S11 and not in Illumos (and vice-versa). -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
Robert Milkowski wrote: Robert Milkowski wrote: Solaris 11.1 (free for non-prod use). But a ticking bomb if you use a cache device. It's been fixed in SRU (although this is only for customers with a support contract - still, will be in 11.2 as well). Then, I'm sure there are other bugs which are fixed in S11 and not in Illumos (and vice-versa). There may well be, but in seven+ years of using ZFS, this was the first one to cost me a pool. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss