Re: [zfs-discuss] importing pool with missing/failed log device
On 21.10.09 23:23, Paul B. Henson wrote: I've had a case open for a while (SR #66210171) regarding the inability to import a pool whose log device failed while the pool was off line. I was told this was CR #6343667, CR 6343667 synopsis is scrub/resilver has to start over when a snapshot is taken: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667 so I do not see how it can be related to log removal. Could you please check bug number in question? regards, victor which was supposedly fixed in patches 141444-09/141445-09. However, I recently upgraded a system to U8 which includes that kernel patch, and still am unable to import a pool with a failed log device: r...@ike ~ # zpool import pool: export id: 4066329346842580031 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X config: export UNAVAIL missing device mirrorONLINE c0t0d0 ONLINE c1t0d0 ONLINE [...] Additional devices are known to be part of this pool, though their exact configuration cannot be determined. I have not as yet updated the pool to the new version included in U8, but I was not told that was a prerequisite to availing of the fix. Is this issue supposed to have been fixed by that CR, or did that resolve some other issue and I was misinformed on my support ticket? Any information appreciated, thanks... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk locating in OpenSolaris/Solaris 10
If you use an LSI, maybe you install the LSI Logic MPT Configuration Utility. Example of the usage : lsiutil LSI Logic MPT Configuration Utility, Version 1.61, September 18, 2008 1 MPT Port found Port Name Chip Vendor/Type/RevMPT Rev Firmware Rev IOC 1. mpt0 LSI Logic SAS1068E B3 105 011a 0 Select a device: [1-1 or 0 to quit] 1 1. Identify firmware, BIOS, and/or FCode 2. Download firmware (update the FLASH) 4. Download/erase BIOS and/or FCode (update the FLASH) 8. Scan for devices 10. Change IOC settings (interrupt coalescing) 13. Change SAS IO Unit settings 16. Display attached devices 20. Diagnostics 21. RAID actions 22. Reset bus 23. Reset target 42. Display operating system names for devices 45. Concatenate SAS firmware and NVDATA files 59. Dump PCI config space 60. Show non-default settings 61. Restore default settings 66. Show SAS discovery errors 69. Show board manufacturing information 97. Reset SAS link, HARD RESET 98. Reset SAS link 99. Reset port e Enable expert mode in menus p Enable paged mode w Enable logging Main menu, select an option: [1-99 or e/p/w or 0 to quit] 16 SAS1068E's links are down, down, down, down, 3.0 G, 3.0 G, 3.0 G, 3.0 G B___T SASAddress PhyNum Handle Parent Type 500605b000eea990 0001 SAS Initiator 500605b000eea991 0002 SAS Initiator 500605b000eea992 0003 SAS Initiator 500605b000eea993 0004 SAS Initiator 500605b000eea994 0005 SAS Initiator 500605b000eea995 0006 SAS Initiator 500605b000eea996 0007 SAS Initiator 500605b000eea997 0008 SAS Initiator 50030480003d95ff 4 00090005 Edge Expander 0 10 50030480003d95c4 4 000a0009 SATA Target 0 11 50030480003d95c5 5 000b0009 SATA Target 0 12 50030480003d95c6 6 000c0009 SATA Target 0 13 50030480003d95c7 7 000d0009 SATA Target 0 14 50030480003d95c8 8 000e0009 SATA Target 0 15 50030480003d95c9 9 000f0009 SATA Target 0 17 50030480003d95ca10 00100009 SATA Target 0 16 50030480003d95cb11 00110009 SATA Target 0 18 50030480003d95cc12 00120009 SATA Target 0 19 50030480003d95cd13 00130009 SATA Target 0 20 50030480003d95ce14 00140009 SATA Target 0 21 50030480003d95cf15 00150009 SATA Target 0 22 50030480003d95d016 00160009 SATA Target 0 23 50030480003d95d117 00170009 SATA Target 0 24 50030480003d95d218 00180009 SATA Target 0 25 50030480003d95d319 00190009 SATA Target 0 26 50030480003d95d622 001a0009 SATA Target 0 8 50030480003d95fd36 001b0009 SAS Initiator and Target The colum PhyNum , im my case points out to the drive disk slot in the JBOD chassis. However i don't know how this works with multipath. The ideal solution would be to use the cfgadm with the hardware option, to put the disk led blinking. Something like, as seen in http://docs.sun.com/app/docs/doc/816-5166/cfgadm-scsi-1m?a=view : Example 6 Display the Value of the Locator for a Disk The following command displays the value of the locator for a disk. This example is specific to the SPARC Enterprise Server family: # *cfgadm -x locator c0::dsk/c0t6d0* The system responds with the following: DiskLed c0t6d0 locator=on But maybe this option is just for SPARC with SCSI? Bruno SHOUJIN WANG wrote: Hi there, What I am tring to do is: Build a NAS storage server based on the following hardware architecture: Server--SAS HBA---SAS JBOD I plugin 2 SAS HBA cards into a X86 box, I also have 2 SAS I/O Modules on SAS JBOD. From each HBA card, I have one SAS cable which connects to SAS JBOD. Configured MPT successfully on server, I can see the single multipahted disks likes the following: r...@super01:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t5000C5000D34BEDFd0 SEAGATE-ST31000640SS-0001-931.51GB /scsi_vhci/d...@g5000c5000d34bedf 1. c0t5000C5000D34BF37d0 SEAGATE-ST31000640SS-0001-931.51GB /scsi_vhci/d...@g5000c5000d34bf37 2. c0t5000C5000D34C727d0 SEAGATE-ST31000640SS-0001-931.51GB /scsi_vhci/d...@g5000c5000d34c727 3. c0t5000C5000D34D0C7d0 SEAGATE-ST31000640SS-0001-931.51GB /scsi_vhci/d...@g5000c5000d34d0c7 4. c0t5000C5000D34D85Bd0 SEAGATE-ST31000640SS-0001-931.51GB /scsi_vhci/d...@g5000c5000d34d85b The problem is: if one of disks failed, I don't know how to locate the disk in chasiss. It is diffcult for failed disk replacement. Is there any utility in opensoalris which can be used to locate/blink
[zfs-discuss] SNV_125 MPT warning in logfile
Hi all, Recently i upgrade from snv_118 to snv_125, and suddently i started to see this messages at /var/adm/messages : Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:54:37 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:56:47 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:56:47 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:56:50 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:56:50 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Is this a symptom of a disk error or some change was made in the driver?,that now i have more information, where in the past such information didn't appear? Thanks, Bruno I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this behaviour : 1 MPT Port found Port Name Chip Vendor/Type/RevMPT Rev Firmware Rev IOC 1. mpt0 LSI Logic SAS1068E B3 105 011a 0 Select a device: [1-1 or 0 to quit] 1 1. Identify firmware, BIOS, and/or FCode 2. Download firmware (update the FLASH) 4. Download/erase BIOS and/or FCode (update the FLASH) 8. Scan for devices 10. Change IOC settings (interrupt coalescing) 13. Change SAS IO Unit settings 16. Display attached devices 20. Diagnostics 21. RAID actions 22. Reset bus 23. Reset target 42. Display operating system names for devices 45. Concatenate SAS firmware and NVDATA files 59. Dump PCI config space 60. Show non-default settings 61. Restore default settings 66. Show SAS discovery errors 69. Show board manufacturing information 97. Reset SAS link, HARD RESET 98. Reset SAS link 99. Reset port e Enable expert mode in menus p Enable paged mode w Enable logging Main menu, select an option: [1-99 or e/p/w or 0 to quit] 20 1. Inquiry Test 2. WriteBuffer/ReadBuffer/Compare Test 3. Read Test 4. Write/Read/Compare Test 8. Read Capacity / Read Block Limits Test 12. Display phy counters 13. Clear phy counters 14. SATA SMART Read Test 15. SEP (SCSI Enclosure Processor) Test 18. Report LUNs Test 19. Drive firmware download 20. Expander firmware download 21. Read Logical Blocks 99. Reset port e Enable expert mode in menus p Enable paged mode w Enable logging Diagnostics menu, select an option: [1-99 or e/p/w or 0 to quit] 12 Adapter Phy 0: Link Down, No Errors Adapter Phy 1: Link Down, No Errors Adapter Phy 2: Link Down, No Errors Adapter Phy 3: Link Down, No Errors Adapter Phy 4: Link Up, No Errors Adapter Phy 5: Link Up, No Errors Adapter Phy 6: Link Up, No Errors Adapter Phy 7: Link Up, No Errors Expander (Handle 0009) Phy 0: Link Up Invalid DWord Count 79,967,229 Running Disparity Error Count63,036,893 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 1: Link Up Invalid DWord Count 79,967,207 Running Disparity Error Count78,339,626 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 2: Link Up Invalid DWord Count 76,717,646 Running Disparity Error Count73,334,563 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 3: Link Up Invalid DWord Count 79,896,409 Running Disparity Error Count76,199,329 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 4: Link Up, No Errors Expander (Handle 0009) Phy 5: Link Up, No Errors Expander (Handle 0009) Phy 6: Link Up, No Errors Expander (Handle 0009) Phy 7: Link Up, No Errors Expander (Handle 0009) Phy 8: Link Up, No Errors Expander (Handle 0009) Phy 9: Link Up, No Errors Expander (Handle 0009) Phy 10: Link Up, No Errors Expander (Handle 0009) Phy 11: Link Up, No Errors Expander (Handle 0009) Phy 12: Link Up, No Errors Expander (Handle 0009) Phy 13: Link Up, No Errors Expander (Handle 0009) Phy 14: Link Up, No Errors Expander (Handle 0009)
Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check
Replacing failed disks is easy when PERC is doing the RAID. Just remove the failed drive and replace with a good one, and the PERC will rebuild automatically. Sorry, not correct. When you replace a failed drive, the perc card doesn't know for certain that the new drive you're adding is meant to be a replacement. For all it knows, you could coincidentally be adding new disks for a new VirtualDevice which already contains data, during the failure state of some other device. So it will not automatically resilver (which would be a permanently destructive process, applied to a disk which is not *certainly* meant for destruction). You have to open the perc config interface, tell it this disk is a replacement for the old disk (probably you're just saying This disk is the new global hotspare) or else the new disk will sit there like a bump on a log. Doing nothing. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check
The Intel specified random write IOPS are with the cache enabled and without cache flushing. They also carefully only use a limited span of the device, which fits most perfectly with how the device is built. How do you know this? This sounds much more detailed than any average person could ever know ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check
Actually, I think this is a case of crossed wires. This issue was reported a while back on a news site for the X25-M G2. Somebody pointed out that these devices have 8GB of cache, which is exactly the dataset size they use for the iops figures. The X25-E datasheet however states that while write cache is enabled, the iops figures are over the entire drive. And looking at the X25-M G2 datasheet again, it states that the measurements are over 8GB of range, but these come with 32MB of cache, so I think that was also a false alarm. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] raidz ZFS Best Practices wiki inconsistency
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#RAID-Z_Configuration_Requirements_and_Recommendations says that the number of disks in a RAIDZ should be (N+P) with N = {2,4,8} and P = {1,2}. But if you go down the page just a little further to the thumper configuration examples, none of the 3 examples follow this recommendation! I will have 10 disks to put into a RAIDZ. I would like as little waste as possible, so that means just 1 hot spare, and a 3,3,3 config for the remaining 9 is not appealing. Should I do a single 9 disk RAIDZ, per the guideline, or should I do 4,5. This is for engineering data. My workload isn't established yet but from talking to the guys the working set would fit in a TB and just be local to engineer workstations, while the file server will just store infrequently used data. As such, I'm inclined to do a single 9 disk RAIDZ and maximize the available disk space, which at the same time follows the configuration guideline. I'm pretty sure I already know the correct answer as I remember when this guideline was created and why. Besides just thinking out loud, I do want to emphasize the inconsistency on the wiki and suggest that it be updated or a comment added. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz ZFS Best Practices wiki inconsistency
Thanks for your comments, Frank. I will take a look at the inconsistencies. Cindy On 10/22/09 08:29, Frank Cusack wrote: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#RAID-Z_Configuration_Requirements_and_Recommendations says that the number of disks in a RAIDZ should be (N+P) with N = {2,4,8} and P = {1,2}. But if you go down the page just a little further to the thumper configuration examples, none of the 3 examples follow this recommendation! I will have 10 disks to put into a RAIDZ. I would like as little waste as possible, so that means just 1 hot spare, and a 3,3,3 config for the remaining 9 is not appealing. Should I do a single 9 disk RAIDZ, per the guideline, or should I do 4,5. This is for engineering data. My workload isn't established yet but from talking to the guys the working set would fit in a TB and just be local to engineer workstations, while the file server will just store infrequently used data. As such, I'm inclined to do a single 9 disk RAIDZ and maximize the available disk space, which at the same time follows the configuration guideline. I'm pretty sure I already know the correct answer as I remember when this guideline was created and why. Besides just thinking out loud, I do want to emphasize the inconsistency on the wiki and suggest that it be updated or a comment added. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] importing pool with missing/failed log device
On Thu, 22 Oct 2009, Victor Latushkin wrote: CR 6343667 synopsis is scrub/resilver has to start over when a snapshot is taken: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667 so I do not see how it can be related to log removal. Could you please check bug number in question? Ack, my bad, too many open cases :(, sorry. The correct bug for the inquiry is CR 6707530. Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check
On Thu, 22 Oct 2009, Marc Bevand wrote: Bob Friesenhahn bfriesen at simple.dallas.tx.us writes: For random write I/O, caching improves I/O latency not sustained I/O throughput (which is what random write IOPS usually refer to). So Intel can't cheat with caching. However they can cheat by benchmarking a brand new drive instead of an aged one. With FLASH devices, a sufficiently large write cache can improve random write I/O. One can imagine that the wear leveling logic could be used to do tricky remapping so that several random writes actually lead to sequential writes to the same FLASH superblock so only one superblock needs to be updated and the parts of the old superblocks which would have been overwritten are marked as unused. This of course requires rather advanced remapping logic at a finer-grained resolution than the superblock. When erased space becomes tight (or on a periodic basis), the data in several sparsely-used superblocks are migrated to a different superblock in a more compact way (along with requisite logical block remapping) to reclaim space. It is worth developing such remapping logic since FLASH erasures and re-writes are so expensive. They also carefully only use a limited span of the device, which fits most perfectly with how the device is built. AFAIK, for the X25-E series, they benchmark random write IOPS on a 100% span. You may be confusing it with the X25-M series with which they actually clearly disclose two performance numbers: 350 random write IOPS on 8GB span, and 3.3k on 100% span. See http://www.intel.com/cd/channel/reseller/asmo-na/eng/products/nand/tech/425265.htm You are correct that I interpreted the benchmark scenarios from the X25-M series documentation. It seems reasonable for the same manufacturer to use the same benchmark methodology for similar products. Then again, they are still new at this. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
Hi Bruno, I see some bugs associated with these messages (6694909) that point to an LSI firmware upgrade that cause these harmless errors to display. According to the 6694909 comments, this issue is documented in the release notes. As they are harmless, I wouldn't worry about them. Maybe someone from the driver group can comment further. Cindy On 10/22/09 05:40, Bruno Sousa wrote: Hi all, Recently i upgrade from snv_118 to snv_125, and suddently i started to see this messages at /var/adm/messages : Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:54:37 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:56:47 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:56:47 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:56:50 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0): Oct 22 12:56:50 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Is this a symptom of a disk error or some change was made in the driver?,that now i have more information, where in the past such information didn't appear? Thanks, Bruno I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this behaviour : 1 MPT Port found Port Name Chip Vendor/Type/RevMPT Rev Firmware Rev IOC 1. mpt0 LSI Logic SAS1068E B3 105 011a 0 Select a device: [1-1 or 0 to quit] 1 1. Identify firmware, BIOS, and/or FCode 2. Download firmware (update the FLASH) 4. Download/erase BIOS and/or FCode (update the FLASH) 8. Scan for devices 10. Change IOC settings (interrupt coalescing) 13. Change SAS IO Unit settings 16. Display attached devices 20. Diagnostics 21. RAID actions 22. Reset bus 23. Reset target 42. Display operating system names for devices 45. Concatenate SAS firmware and NVDATA files 59. Dump PCI config space 60. Show non-default settings 61. Restore default settings 66. Show SAS discovery errors 69. Show board manufacturing information 97. Reset SAS link, HARD RESET 98. Reset SAS link 99. Reset port e Enable expert mode in menus p Enable paged mode w Enable logging Main menu, select an option: [1-99 or e/p/w or 0 to quit] 20 1. Inquiry Test 2. WriteBuffer/ReadBuffer/Compare Test 3. Read Test 4. Write/Read/Compare Test 8. Read Capacity / Read Block Limits Test 12. Display phy counters 13. Clear phy counters 14. SATA SMART Read Test 15. SEP (SCSI Enclosure Processor) Test 18. Report LUNs Test 19. Drive firmware download 20. Expander firmware download 21. Read Logical Blocks 99. Reset port e Enable expert mode in menus p Enable paged mode w Enable logging Diagnostics menu, select an option: [1-99 or e/p/w or 0 to quit] 12 Adapter Phy 0: Link Down, No Errors Adapter Phy 1: Link Down, No Errors Adapter Phy 2: Link Down, No Errors Adapter Phy 3: Link Down, No Errors Adapter Phy 4: Link Up, No Errors Adapter Phy 5: Link Up, No Errors Adapter Phy 6: Link Up, No Errors Adapter Phy 7: Link Up, No Errors Expander (Handle 0009) Phy 0: Link Up Invalid DWord Count 79,967,229 Running Disparity Error Count63,036,893 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 1: Link Up Invalid DWord Count 79,967,207 Running Disparity Error Count78,339,626 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 2: Link Up Invalid DWord Count 76,717,646 Running Disparity Error Count73,334,563 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 3: Link Up Invalid DWord Count 79,896,409 Running Disparity Error Count76,199,329 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 4: Link Up, No Errors Expander (Handle 0009) Phy 5: Link Up, No Errors Expander (Handle 0009) Phy 6: Link Up, No Errors Expander (Handle 0009) Phy 7: Link Up, No
Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check
Interesting. We must have different setups with our PERCs. Mine have always auto rebuilt. -- Scott Meilicke On Oct 22, 2009, at 6:14 AM, Edward Ned Harvey sola...@nedharvey.com wrote: Replacing failed disks is easy when PERC is doing the RAID. Just remove the failed drive and replace with a good one, and the PERC will rebuild automatically. Sorry, not correct. When you replace a failed drive, the perc card doesn't know for certain that the new drive you're adding is meant to be a replacement. For all it knows, you could coincidentally be adding new disks for a new VirtualDevice which already contains data, during the failure state of some other device. So it will not automatically resilver (which would be a permanently destructive process, applied to a disk which is not *certainly* meant for destruction). You have to open the perc config interface, tell it this disk is a replacement for the old disk (probably you're just saying This disk is the new global hotspare) or else the new disk will sit there like a bump on a log. Doing nothing. We value your opinion! How may we serve you better? Please click the survey link to tell us how we are doing: http://www.craneae.com/ContactUs/VoiceofCustomer.aspx Your feedback is of the utmost importance to us. Thank you for your time. Crane Aerospace Electronics Confidentiality Statement: The information contained in this email message may be privileged and is confidential information intended only for the use of the recipient, or any employee or agent responsible to deliver it to the intended recipient. Any unauthorized use, distribution or copying of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately and destroy the original message and all attachments from your electronic files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] strange results ...
jel+...@cs.uni-magdeburg.de said: 2nd) Never had a Sun STK RAID INT before. Actually my intention was to create a zpool mirror of sd0 and sd1 for boot and logs, and a 2x2-way zpool mirror with the 4 remaining disks. However, the controller seems not to support JBODs :( - which is also bad, since we can't simply put those disks into another machine with a different controller without data loss, because the controller seems to use its own format under the hood. Yes, those Adaptec/STK internal RAID cards are annoying for use with ZFS. You also cannot replace a failed disk without using the STK RAID software to configure the new disk as a standalone volume (before zpool replace). Fortunately you probably don't need to boot into the BIOS-level utility, I think you can use the Adaptec StorMan utilities from within the OS, if you remembered to install them. Also the 256MB BBCache seems to be a little bit small for ZIL even if one would know, how to configure it ... Unless you have an external (non-NV cached) pool on the same server, you wouldn't gain anything from setting up a separate ZIL in this case. All your internal drives have NV cache without doing anything special. So what would you recommend? Creating 2 appropriate STK INT arrays and using both as a single zpool device, i.e. without ZFS mirror devs and 2nd copies? Here's what we did: Configure all internal disks as standalone volumes on the RAID card. All those volumes have the battery-backed cache enabled. The first two 146GB drives got sliced in two: the first half of each disk became the boot/root mirror pool. The 2nd half was used for a separate-ZIL mirror, applied to an external SATA pool. Our remaining internal drives were configured into a mirrored ZFS pool for database transaction logs. No need for a separate ZIL there, since the internal drives effectively have NV cache as far as ZFS is concerned. Yes, the 256MB cache is small, but if it fills up, it is backed by the 10kRPM internal SAS drives, which should have decent latency when compared to external SATA JBOD drives. And even this tiny NV cache makes a huge difference when used on an NFS server: http://acc.ohsu.edu/~hakansom/j4400_bench.html Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] [Fwd: snv_123: kernel memory leak?]
anyone? ---BeginMessage--- Hi, pre ::status debugging live kernel (64-bit) on mk-archive-1 operating system: 5.11 snv_123 (i86pc) ::system set noexec_user_stack_log=0x1 [0t1] set noexec_user_stack=0x1 [0t1] set snooping=0x1 [0t1] set zfs:zfs_arc_max=0x28000 [0t10737418240] ::memstat Page SummaryPagesMB %Tot Kernel1843301 7200 44% ZFS File Data 1651701 6451 40% Anon 202473 7905% Exec and libs 934 30% Page cache 4369170% Free (cachelist) 2084 80% Free (freelist)462699 1807 11% Total 4167561 16279 Physical 4167560 16279 We are experiencing out-of-memory issues during the night on this server and our apps do not use much memory but are during lots of disk and network IO. The server is x4500 with 16GB of memory, the filesystem is of course ZFS. I'm concerned with the kernel size here. I'm getting applications crashing and errors like: WARNING: Sorry, no swap space to grow stack for pid 5991 (cron) WARNING: Sorry, no swap space to grow stack for pid 6092 (mv) WARNING: Sorry, no swap space to grow stack for pid 18481 (ggrep) WARNING: Sorry, no swap space to grow stack for pid 18562 (cron) WARNING: /tmp: File system full, swap space limit exceeded WARNING: /etc/svc/volatile: File system full, swap space limit exceeded And no, we are not filling /tmp and besides /tmp is limited to 1GB anyway. Couple of highlights from ::kmastat [...] cachebufbufbufmemory alloc alloc namesize in use totalin use succeed fail - -- -- -- -- - - kmem_magazine_143 1152 53621 55065 75182080B 15082603 [b]5195[/b] [...] vmem memory memorymemory alloc alloc name in use totalimport succeed fail - -- --- -- - - heap [b]13564526592B[/b] 1077569126400B 0B 34565103 0 [...] kmem_metadata 586997760B 623640576B 623640576B912072 0 kmem_msb 539336704B 539336704B 539336704B864680 [b]5198[/b] [...] kmem_firewall_va 141729792B 141729792B 141729792B578915 0 kmem_firewall 0B 0B 0B 0 0 kmem_oversize 141532041B 141729792B 141729792B578904 [b]11[/b] [...] kmem_va [b]12148248576B[/b] 12148248576B 12148248576B 31994594 0 kmem_default 5738831872B 5738831872B 5738831872B 187061764 0 [...] zfs_file_data 6734151680B 17070817280B 0B 90214511 0 zfs_file_data_buf 6734151680B 6734151680B 6734151680B 90214511 0 [...] Is it a memory fragmentation or something else? A full ::kmastat output ::kmastat cachebufbufbufmemory alloc alloc namesize in use totalin use succeed fail - -- -- -- -- - - kmem_magazine_1 16 2893 32379528384B 94677416 1 kmem_magazine_3 32 5111 5875192512B 11584154 0 kmem_magazine_7 64 15917 17670 1167360B 24614491 0 kmem_magazine_15 128 4478 4619610304B 6863634 0 kmem_magazine_31 256 7823 12150 3317760B 3485723 0 kmem_magazine_47 384 2796 3280 1343488B 1616387 0 kmem_magazine_63 512550 3353 1961984B 1256504 0 kmem_magazine_95 768 13035 18095 14823424B 18231914 0 kmem_magazine_143 1152 53621 55065 75182080B 15082603 5195 kmem_slab_cache 72 822572 1785630 132980736B 275428877 0 kmem_bufctl_cache 24 6384507 12425301 304754688B 535280109 0 kmem_bufctl_audit_cache 192 0 0 0B 0 0 kmem_va_40964096 1091176 1998784 3892051968B 49062569 0 kmem_va_81928192 17696 18896 154796032B 4835863 0 kmem_va_12288 12288451 3580 46923776B 9282713 0 kmem_va_16384 16384 26040 182176 2984771584B 106103361 0 kmem_va_20480 20480 5974 6984 152567808B 9298381 0 kmem_va_24576 24576 94365 9568256B 2731797 0 kmem_va_28672 28672270 1700 55705600B 9520293 0 kmem_va_32768 32768406620 20316160B 2796821 0 kmem_alloc_8 8 203502
Re: [zfs-discuss] ZFS disk failure question
Hi Jason, Since spare replacement is an important process, I've rewritten this section to provide 3 main examples, here: http://docs.sun.com/app/docs/doc/817-2271/gcvcw?a=view Scroll down the section: Activating and Deactivating Hot Spares in Your Storage Pool Example 4–7 Manually Replacing a Disk With a Hot Spare Example 4–8 Detaching a Hot Spare After the Failed Disk is Replaced Example 4–9 Detaching a Failed Disk and Using the Hot Spare The third example is your scenario. I finally listened to the answer, which is you must detach the original disk if you want to continue to use the spare and replace the original disk later. It all works as described. I see some other improvements coming with spare replacement and will provide details when they are available. Thanks, Cindy On 10/14/09 15:54, Jason Frank wrote: See, I get overly literal when working on failed production storage (and yes, I do have backups...) I wasn't wanting to cancel the in-progress spare replacement. I had a completed spare replacement, and I wanted to make it official. So, that didn't really fit my scenario either. I'm glad you agree on the brevity of the detach subcommand man page. I would guess that the intricacies of the failure modes would probably lend itself to richer content than a man page. I'd really like to see some kind of web based wizard to walk through it I doubt I'd get motivated to write it myself though. The web page Cindy pointed to does not cover how to make the replacement official either. It gets close. But at the end, it detaches the hot spare, and not the original disk. Everything seems to be close, but not quite there. Of course, now that I've been through this once, I'll remember all. I'm just thinking of the children. Also, I wanted to try and reconstruct all of my steps from zpool history -i tank. According to that, zpool decided to replace t7 with t11 this morning (why wasn't it last night?), and I offlined, onlined and detach of t7 and I was OK. I did notice that the history records internal scrubs, but not resilvers, It also doesn't record failed commands, or disk failures in a zpool. It would be sweet to have a line that said something like marking vdev /dev/dsk/c8t7d0s0 as UNAVAIL due to X read errors in Y minutes, Then we can really see what happened. Jason On Wed, Oct 14, 2009 at 4:32 PM, Eric Schrock eric.schr...@sun.com wrote: On 10/14/09 14:26, Jason Frank wrote: Thank you, that did the trick. That's not terribly obvious from the man page though. The man page says it detaches the devices from a mirror, and I had a raidz2. Since I'm messing with production data, I decided I wasn't going to chance it when I was reading the man page. You might consider changing the man page, and explaining a little more what it means, maybe even what the circumstances look like where you might use it. This is covered in the Hot Spares section of the manpage: An in-progress spare replacement can be cancelled by detach- ing the hot spare. If the original faulted device is detached, then the hot spare assumes its place in the confi- guration, and is removed from the spare list of all active pools. It is true that the description for zpool detach is overly brief and could be expanded to include this use case. - Eric -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] strange results ...
Jens Elkner wrote: Hmmm, wondering about IMHO strange ZFS results ... X4440: 4x6 2.8GHz cores (Opteron 8439 SE), 64 GB RAM 6x Sun STK RAID INT V1.0 (Hitachi H103012SCSUN146G SAS) Nevada b124 Started with a simple test using zfs on c1t0d0s0: cd /var/tmp (1) time sh -c 'mkfile 32g bla ; sync' 0.16u 19.88s 5:04.15 6.5% (2) time sh -c 'mkfile 32g blabla ; sync' 0.13u 46.41s 5:22.65 14.4% (3) time sh -c 'mkfile 32g blablabla ; sync' 0.19u 26.88s 5:38.07 8.0% chmod 644 b* (4) time dd if=bla of=/dev/null bs=128k 262144+0 records in 262144+0 records out 0.26u 25.34s 6:06.16 6.9% (5) time dd if=blabla of=/dev/null bs=128k 262144+0 records in 262144+0 records out 0.15u 26.67s 4:46.63 9.3% (6) time dd if=blablabla of=/dev/null bs=128k 262144+0 records in 262144+0 records out 0.10u 20.56s 0:20.68 99.9% So 1-3 is more or less expected (~97..108 MB/s write). However 4-6 looks strange: 89, 114 and 1585 MB/s read! Since the arc size is ~55+-2GB (at least arcstat.pl says so), I guess (6) reads from memory completely. Hmm - maybe. However, I would expect, that when repeating 5-6, 'blablabla' gets replaced by 'bla' or 'blabla'. But the numbers say, that 'blablabla' is kept in the cache, since I get almost the same results as in the first run (and zpool iostat/arcstat.pl show for the blablabla almost no activity at all). So is this a ZFS bug? Or does the OS some magic here? IIRC zfs when detects sequential reads in a given file will stop caching block for it. So because #6 wes created as last one all its blocks are cached in arc, then when reading in #4 and #5 zfs detected sequential read and did not put data in a cache leaving last written file entirely cached. While for many workloads this is desired behavior for many other it is not (like parsing with grep like tool large log files which are not getting cached...). -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Only a few days left for Online Registration: Solaris Security Summit Nov 3rd
Hello All There is still time to register online. You will also be available to register on-site as well. Just to give you an idea of the presentation that will be given. * Presentation: Kerberos Authentication for Web Security * Presentation: Protecting Oracle Applications with Built-In Solaris Security Features * Presentation: H/W based isolation and security for Virtual Machine Network * Presentation: ZFS-Crypto Overview Hope to see you there!! - To: Developers and Students You are invited to participate in the first OpenSolaris Security Summit Solaris Security Summit Tuesday, November 3rd, 2009 Baltimore Marriott Waterfront 700 Aliceanna Street Baltimore, Maryland 21202 Join us as we explore the latest trends of Solaris Security technologies, as well as key insights from security community members, technologists, and users. You will also have the unique opportunity to hear from our keynote speaker William Cheswick, Lead Member of the Technical Staff at ATT labs Bio: Ches is an early innovator in Internet security. He is known for his work in firewalls, proxies, and Internet mapping at Bell Labs and Lumeta Corp. He is best known for the book he co-authored with Steve Bellovin and now Avi Rubin, Firewalls and Internet Security; Repelling the Wily Hacker. Ches is now a member of the technical staff at ATT Labs - Research in Florham Park, NJ, where he is working on security, visualization, user interfaces, and a variety of other things. Registration now available! http://wikis.sun.com/display/secsummit09/ http://www.usenix.org/events/lisa09/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
Cindy: How can I view the bug report you referenced? Standard methods show my the bug number is valid (6694909) but no content or notes. We are having similar messages appear with snv_118 with a busy LSI controller, especially during scrubbing, and I'd be interested to see what they mentioned in that report. Also, the LSI firmware updates for the LSISAS3081E (the controller we use) don't usually come with release notes indicating what has changed in each firmware revision, so I'm not sure where they got that idea from. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
Adam Cheal wrote: Cindy: How can I view the bug report you referenced? Standard methods show my the bug number is valid (6694909) but no content or notes. We are having similar messages appear with snv_118 with a busy LSI controller, especially during scrubbing, and I'd be interested to see what they mentioned in that report. Also, the LSI firmware updates for the LSISAS3081E (the controller we use) don't usually come with release notes indicating what has changed in each firmware revision, so I'm not sure where they got that idea from. Hi Adam, unfortunately, you can't see that bug from outside. The evaluation from LSI is very clear that this is a firmware issue rather than a driver issue, and is claimed to be fixed in LSI BIOS v6.26.00 FW 1.27.02 (aka Phase 15) cheers, James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool with very different sized vdevs?
I have a new array of 4x1.5TB drives running fine. I also have the old array of 4x400GB drives in the box on a separate pool for testing. I was planning to have the old drives just be a backup file store, so I could keep snapshots and such over there for important files. I was wondering if it makes any sense to add the older drives to the new pool. Reliability might be lower as they are older drives, so if I were to loose 2 of them, things could get ugly. I'm just curious if it would make any sense to do something like this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
James: We are running Phase 16 on our LSISAS3801E's, and have also tried the recently released Phase 17 but it didn't help. All firmware NVRAM settings are default. Basically, when we put the disks behind this controller under load (e.g. scrubbing, recursive ls on large ZFS filesystem) we get this series of log entries that appear at random intervals: scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49): incomplete read- retrying scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110b00 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110b00 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@2d,0 (sd42): incomplete read- retrying scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): Rev. 8 LSI, Inc. 1068E found. scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt0 supports power management. scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0): mpt0: IOC Operational. It seems to be timing out accessing a disk, retrying, giving up and then doing a bus reset? This is happening with random disks behind the controller and on multiple systems with the same hardware config. We are running snv_118 right now and was hoping this was some magic mpt-related bug that was going to be fixed in snv_125 but it doesn't look like it. The LSI3801E is driving 2 x 23-disk JBOD's which, albeit a dense solution, it should be able to handle. We are also using wide raidz2 vdevs (22 disks each, one per JBOD) which agreeably is slower performance-wise, but the goal here is density not performance. I would have hoped that the system would just slow down if there was IO contention, but not experience things like bus resets. Your thoughts? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
Adam Cheal wrote: James: We are running Phase 16 on our LSISAS3801E's, and have also tried the recently released Phase 17 but it didn't help. All firmware NVRAM settings are default. Basically, when we put the disks behind this controller under load (e.g. scrubbing, recursive ls on large ZFS filesystem) we get this series of log entries that appear at random intervals: ... It seems to be timing out accessing a disk, retrying, giving up and then doing a bus reset? This is happening with random disks behind the controller and on multiple systems with the same hardware config. We are running snv_118 right now and was hoping this was some magic mpt-related bug that was going to be fixed in snv_125 but it doesn't look like it. The LSI3801E is driving 2 x 23-disk JBOD's which, albeit a dense solution, it should be able to handle. We are also using wide raidz2 vdevs (22 disks each, one per JBOD) which agreeably is slower performance-wise, but the goal here is density not performance. I would have hoped that the system would just slow down if there was IO contention, but not experience things like bus resets. Your thoughts? ugh. New bug time - bugs.opensolaris.org, please select Solaris / kernel / driver-mpt. In addition to the error messages and description of when you see it, please provide output from cfgadm -lav prtconf -v I'll see that it gets moved to the correct group asap. Cheers, James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool getting in a stuck state?
Hey folks! We're using zfs-based file servers for our backups and we've been having some issues as of late with certain situations causing zfs/ zpool commands to hang. Currently, it appears that raid3155 is in this broken state: r...@homiebackup10:~# ps auxwww | grep zfs root 15873 0.0 0.0 4216 1236 pts/2S 15:56:54 0:00 grep zfs root 13678 0.0 0.1 7516 2176 ?S 14:18:00 0:00 zfs list - t filesystem raid3155/angels root 13691 0.0 0.1 7516 2188 ?S 14:18:04 0:00 zfs list - t filesystem raid3155/blazers root 13731 0.0 0.1 7516 2200 ?S 14:18:20 0:00 zfs list - t filesystem raid3155/broncos root 13792 0.0 0.1 7516 2220 ?S 14:18:51 0:00 zfs list - t filesystem raid3155/diamondbacks root 13910 0.0 0.1 7516 2216 ?S 14:19:52 0:00 zfs list - t filesystem raid3155/knicks root 13911 0.0 0.1 7516 2196 ?S 14:19:53 0:00 zfs list - t filesystem raid3155/lions root 13916 0.0 0.1 7516 2220 ?S 14:19:55 0:00 zfs list - t filesystem raid3155/magic root 13933 0.0 0.1 7516 2232 ?S 14:20:01 0:00 zfs list - t filesystem raid3155/mariners root 13966 0.0 0.1 7516 2212 ?S 14:20:11 0:00 zfs list - t filesystem raid3155/mets root 13971 0.0 0.1 7516 2208 ?S 14:20:21 0:00 zfs list - t filesystem raid3155/niners root 13982 0.0 0.1 7516 2220 ?S 14:20:32 0:00 zfs list - t filesystem raid3155/padres root 14064 0.0 0.1 7516 2220 ?S 14:21:03 0:00 zfs list - t filesystem raid3155/redwings root 14123 0.0 0.1 7516 2212 ?S 14:21:20 0:00 zfs list - t filesystem raid3155/seahawks root 14323 0.0 0.1 7420 2184 ?S 14:22:51 0:00 zfs allow zfsrcv create,mount,receive,share raid3155 root 15245 0.0 0.1 7468 2256 ?S 15:17:59 0:00 zfs create raid3155/angels root 15250 0.0 0.1 7468 2244 ?S 15:18:03 0:00 zfs create raid3155/blazers root 15256 0.0 0.1 7468 2248 ?S 15:18:19 0:00 zfs create raid3155/broncos root 15284 0.0 0.1 7468 2256 ?S 15:18:51 0:00 zfs create raid3155/diamondbacks root 15322 0.0 0.1 7468 2260 ?S 15:19:51 0:00 zfs create raid3155/knicks root 15332 0.0 0.1 7468 2260 ?S 15:19:53 0:00 zfs create raid3155/magic root 15333 0.0 0.1 7468 2236 ?S 15:19:53 0:00 zfs create raid3155/lions root 15345 0.0 0.1 7468 2264 ?S 15:20:01 0:00 zfs create raid3155/mariners root 15355 0.0 0.1 7468 2260 ?S 15:20:10 0:00 zfs create raid3155/mets root 15363 0.0 0.1 7468 2252 ?S 15:20:20 0:00 zfs create raid3155/niners root 15368 0.0 0.1 7468 2256 ?S 15:20:33 0:00 zfs create raid3155/padres root 15384 0.0 0.1 7468 2256 ?S 15:21:01 0:00 zfs create raid3155/redwings root 15389 0.0 0.1 7468 2264 ?S 15:21:20 0:00 zfs create raid3155/seahawks attempting to do a zpool list hangs, as does attempting to do a zpool status raid3155. Rebooting the system (forcefully) seems to 'fix' the problem, but once it comes back up, doing a zpool list or zpool status shows no issues with any of the drives. (after a reboot): r...@homiebackup10:~# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT raid3066 32.5T 18.1T 14.4T55% ONLINE - raid3154 32.5T 18.2T 14.3T55% ONLINE - raid3155 32.5T 18.7T 13.8T57% ONLINE - raid3156 32.5T 22.0T 10.5T67% ONLINE - rpool 59.5G 14.1G 45.4G23% ONLINE - We are using silmech storform iserv r505 machines with 3x silmech storform D55J jbod sas expanders connected to LSI Logic SAS1068E B3 esas cards all containing 1.5TB seagate 7200.11 sata hard drives. We make a single striped raidz2 pool out of each chassis giving us ~29TB of storage out of each 'brick' and we use rsync to copy the data from the machines to be backed up. They're currently running OpenSolaris 2009.06 (snv_111b) We have had issues with the backplanes on these machines, but this particular machine has been up and running for nearly a year without any problems. It's currently at about 50% capacity on all pools. I'm not really sure how to proceed from here as far as getting debug information while it's hung like this. I saw someone with similar issues post a few days ago but don't see any replies. The thread title is [zfs-discuss] Problem with resilvering and faulty disk. We've been seeing that issue as well while rebuilding these drives. Any assistance with this would be greatly appreciated, and any information you folks might need to help troubleshoot this issue I can provide, just let me know what you need! -Jeremy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
I've filed the bug, but was unable to include the prtconf -v output as the comments field only accepted 15000 chars total. Let me know if there is anything else I can provide/do to help figure this problem out as it is essentially preventing us from doing any kind of heavy IO to these pools, including scrubbing. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS disk failure question
Thank you for your follow-up. The doc looks great. Having good examples goes a long way to helping others that have my problem. Ideally, the replacement would all happen magically, and I would have had everything marked as good, with one failed disk (like a certain other storage vendor that has it's beefs with Sun does). But, I can live with detaching them if I have to. Another thing that would be nice would be to receive notification of disk failures from the OS via email or SMS (like the vendor I previously alluded to), but I know I'm talking crazy now. Jason On Thu, Oct 22, 2009 at 2:15 PM, Cindy Swearingen cindy.swearin...@sun.com wrote: Hi Jason, Since spare replacement is an important process, I've rewritten this section to provide 3 main examples, here: http://docs.sun.com/app/docs/doc/817-2271/gcvcw?a=view Scroll down the section: Activating and Deactivating Hot Spares in Your Storage Pool Example 4–7 Manually Replacing a Disk With a Hot Spare Example 4–8 Detaching a Hot Spare After the Failed Disk is Replaced Example 4–9 Detaching a Failed Disk and Using the Hot Spare The third example is your scenario. I finally listened to the answer, which is you must detach the original disk if you want to continue to use the spare and replace the original disk later. It all works as described. I see some other improvements coming with spare replacement and will provide details when they are available. Thanks, Cindy On 10/14/09 15:54, Jason Frank wrote: See, I get overly literal when working on failed production storage (and yes, I do have backups...) I wasn't wanting to cancel the in-progress spare replacement. I had a completed spare replacement, and I wanted to make it official. So, that didn't really fit my scenario either. I'm glad you agree on the brevity of the detach subcommand man page. I would guess that the intricacies of the failure modes would probably lend itself to richer content than a man page. I'd really like to see some kind of web based wizard to walk through it I doubt I'd get motivated to write it myself though. The web page Cindy pointed to does not cover how to make the replacement official either. It gets close. But at the end, it detaches the hot spare, and not the original disk. Everything seems to be close, but not quite there. Of course, now that I've been through this once, I'll remember all. I'm just thinking of the children. Also, I wanted to try and reconstruct all of my steps from zpool history -i tank. According to that, zpool decided to replace t7 with t11 this morning (why wasn't it last night?), and I offlined, onlined and detach of t7 and I was OK. I did notice that the history records internal scrubs, but not resilvers, It also doesn't record failed commands, or disk failures in a zpool. It would be sweet to have a line that said something like marking vdev /dev/dsk/c8t7d0s0 as UNAVAIL due to X read errors in Y minutes, Then we can really see what happened. Jason On Wed, Oct 14, 2009 at 4:32 PM, Eric Schrock eric.schr...@sun.com wrote: On 10/14/09 14:26, Jason Frank wrote: Thank you, that did the trick. That's not terribly obvious from the man page though. The man page says it detaches the devices from a mirror, and I had a raidz2. Since I'm messing with production data, I decided I wasn't going to chance it when I was reading the man page. You might consider changing the man page, and explaining a little more what it means, maybe even what the circumstances look like where you might use it. This is covered in the Hot Spares section of the manpage: An in-progress spare replacement can be cancelled by detach- ing the hot spare. If the original faulted device is detached, then the hot spare assumes its place in the confi- guration, and is removed from the spare list of all active pools. It is true that the description for zpool detach is overly brief and could be expanded to include this use case. - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] cannot import 'rpool': one or more devices is currently unavailable
I have a system who's rpool has gone defunct. The rpool is made of a single disk which is a raid5EE made of all 8 146G disks on the box. The raid card is the Adaptec brand card. It was running nv_107, but its currently net booted to nv_121. I have already checked in the raid card bios, and it says the volume is optimal . We had a power outage in BRM07 on Tuesday, and the system appeared to boot back up, but then went wonky. I power cycled it, and it came back to a grub prompt cause it couldn't read the filesystem. # uname -a SunOS 5.11 snv_121 i86pc i386 i86pc # zpool import pool: rpool id: 7197437773913332097 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: rpool ONLINE c0t0d0s0 ONLINE # zpool import -f 7197437773913332097 cannot import 'rpool': one or more devices is currently unavailable # # zpool import -a -f -R /a cannot import 'rpool': one or more devices is currently unavailable # zdb -l /dev/dsk/c0t0d0s0 LABEL 0 version=14 name='rpool' state=0 txg=742622 pool_guid=7197437773913332097 hostid=4930069 hostname='' top_guid=5620634672424557591 guid=5620634672424557591 vdev_tree type='disk' id=0 guid=5620634672424557591 path='/dev/dsk/c0t0d0s0' devid='id1,s...@tsun_stk_raid_intefd1dfe0/a' phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a' whole_disk=0 metaslab_array=24 metaslab_shift=33 ashift=9 asize=880083730432 is_log=0 LABEL 1 version=14 name='rpool' state=0 txg=742622 pool_guid=7197437773913332097 hostid=4930069 hostname='' top_guid=5620634672424557591 guid=5620634672424557591 vdev_tree type='disk' id=0 guid=5620634672424557591 path='/dev/dsk/c0t0d0s0' devid='id1,s...@tsun_stk_raid_intefd1dfe0/a' phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a' whole_disk=0 metaslab_array=24 metaslab_shift=33 ashift=9 asize=880083730432 is_log=0 LABEL 2 version=14 name='rpool' state=0 txg=742622 pool_guid=7197437773913332097 hostid=4930069 hostname='' top_guid=5620634672424557591 guid=5620634672424557591 vdev_tree type='disk' id=0 guid=5620634672424557591 path='/dev/dsk/c0t0d0s0' devid='id1,s...@tsun_stk_raid_intefd1dfe0/a' phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a' whole_disk=0 metaslab_array=24 metaslab_shift=33 ashift=9 asize=880083730432 is_log=0 LABEL 3 version=14 name='rpool' state=0 txg=742622 pool_guid=7197437773913332097 hostid=4930069 hostname='' top_guid=5620634672424557591 guid=5620634672424557591 vdev_tree type='disk' id=0 guid=5620634672424557591 path='/dev/dsk/c0t0d0s0' devid='id1,s...@tsun_stk_raid_intefd1dfe0/a' phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a' whole_disk=0 metaslab_array=24 metaslab_shift=33 ashift=9 asize=880083730432 is_log=0 # zdb -cu -e -d /dev/dsk/c0t0d0s0 zdb: can't open /dev/dsk/c0t0d0s0: No such file or directory # zdb -e rpool -cu zdb: can't open rpool: No such device or address # zdb -e 7197437773913332097 zdb: can't open 7197437773913332097: No such device or address # I obviously have no clue how to weild zdb. Any help you can offer would be appreciated. Thanks, Tommy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS disk failure question
On Oct 22, 2009, at 12:29 PM, Jason Frank wrote: Thank you for your follow-up. The doc looks great. Having good examples goes a long way to helping others that have my problem. Ideally, the replacement would all happen magically, and I would have had everything marked as good, with one failed disk (like a certain other storage vendor that has it's beefs with Sun does). But, I can live with detaching them if I have to. The zpool autoreplace property manages the policy for automatic replacement in ZFS. I presume it will work for most cases, but am less sure when a RAID controller hides the disk from the OS behind a volume. Does anyone have direct experience with this? Another thing that would be nice would be to receive notification of disk failures from the OS via email or SMS (like the vendor I previously alluded to), but I know I'm talking crazy now. Configure an SNMP monitor to do as you wish. FMA generates SNMP traps when something like that occurs. Solaris ships with net-snmp, see snmpd(1m) for more info. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
On 10/22/09 4:07 PM, James C. McPherson wrote: Adam Cheal wrote: It seems to be timing out accessing a disk, retrying, giving up and then doing a bus reset? ... ugh. New bug time - bugs.opensolaris.org, please select Solaris / kernel / driver-mpt. In addition to the error messages and description of when you see it, please provide output from cfgadm -lav prtconf -v I'll see that it gets moved to the correct group asap. FYI this is very similar to the behaviour I was seeing with my directly attached SATA disks on snv_118 (see the list archives for my original messages). I have not yet seen the error since I replaced my Hitachi 500 GB disks for Seagate 1.5TB disks, so it could very well have been some unfortunate LSI firmware / Hitachi drive firmware interaction. carson:gandalf 0 $ gzcat /var/adm/messages.2.gz | ggrep -4 mpt | tail -9 Oct 8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] /p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0): Oct 8 00:44:17 gandalf.taltos.org Log info 0x3113 received for target 1. Oct 8 00:44:17 gandalf.taltos.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Oct 8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] /p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0): Oct 8 00:44:17 gandalf.taltos.org Log info 0x3113 received for target 1. Oct 8 00:44:17 gandalf.taltos.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Oct 8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] /p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0): Oct 8 00:44:17 gandalf.taltos.org Log info 0x3113 received for target 1. Oct 8 00:44:17 gandalf.taltos.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc carson:gandalf 1 $ gzcat /var/adm/messages.2.gz | sed -ne 's,^.*\(Log info\),\1,p' | sort -u Log info 0x31110b00 received for target 7. Log info 0x3113 received for target 0. Log info 0x3113 received for target 1. Log info 0x3113 received for target 2. Log info 0x3113 received for target 3. Log info 0x3113 received for target 4. Log info 0x3113 received for target 6. Log info 0x3113 received for target 7. Log info 0x3114 received for target 0. Log info 0x3114 received for target 1. Log info 0x3114 received for target 2. Log info 0x3114 received for target 3. Log info 0x3114 received for target 4. Log info 0x3114 received for target 6. Log info 0x3114 received for target 7. carson:gandalf 0 $ gzcat /var/adm/messages.2.gz | sed -ne 's,^.*\(scsi_status\),\1,p' | sort -u scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving files from one fs to another, splittin/merging
On 21/10/2009, at 7:39 AM, Mike Bo wrote: Once data resides within a pool, there should be an efficient method of moving it from one ZFS file system to another. Think Link/Unlink vs. Copy/Remove. I agree with this sentiment, it's certainly a surprise when you first notice. Here's my scenario... When I originally created a 3TB pool, I didn't know the best way carve up the space, so I used a single, flat ZFS file system. Now that I'm more familiar with ZFS, managing the sub- directories as separate file systems would have made a lot more sense (seperate policies, snapshots, etc.). The problem is that some of these directories contain tens of thousands of files and many hundreds of gigabytes. Copying this much data between file systems within the same disk pool just seems wrong. I hope such a feature is possible and not too difficult to implement, because I'd like to see this capability in ZFS. It doesn't seem unreasonable. It seems like the different properties available on the given datasets (recordsize, checksum, compression, encryption, copies, version, utf8only, casesensitivity) would have to match, or else fall back to blind copying? Regards, mikebo -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss