Re: [zfs-discuss] Extremely slow raidz resilvering
Brandon, Thanks for replying to the message. I believe that this is more related to the variable stripe size of RAIDZ than the fdisk MBR. I say this because the disk works without any issues in a mirror configuration or as standalone reaching 80 MB/s burst transfer rates. In RAIDZ, however, the transfer rates are in the KB/s order. Of course that the variable stripe size does not respect any I/O alignment when the disk's firmware does not expose the real 4K sector size and thus, the performance is horrible. Besides, the disk has an EFI label: Total disk size is 60800 cylinders Cylinder size is 32130 (512 byte) blocks Cylinders Partition StatusType Start End Length% = == = === == === 1 EFI 0 6080060801100 ...that uses the whole disk: prtvtoc command output: * /dev/rdsk/c12t0d0 partition map * * Dimensions: * 512 bytes/sector * 1953525168 sectors * 1953525101 accessible sectors * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First SectorLast * Sector CountSector * 34 222 255 * * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 400256 1953508495 1953508750 8 1100 1953508751 16384 1953525134 Though there's an MBR in there (you can check it out with dd), I know that it doesn't affect the alignment because the usable slice starts at sector 256 and being multiple of 2, it mantains the 4K physical sector alignment. Different is the situation if the usable slice would have been started at sector 1 right after the sector 0 MBR. That's because logical sectors 0 through 3 belong to the same 4K physical sector and having that moved by an offset of one, would definitely alter the I/O. To make this clearer for the ones that read about this for the first time, if the logical and physical layouts are not aligned, an operation on one logical stripe/cluster could be partially impacting another physical sector, therefore deteriorating final performance because of the overhead. What would be awesome to do, is to trace all of the I/O access that ZFS does on the pool and try to match that to the physical layout. I already saw someone's work running a DTrace script that records all the accesses and then he creates an animation (black and green sectors) showing the activity on the disk. An incredibly awesome work. I can't find that link right now. That script would throw some light on this. Regards, Leandro. On Thu, May 20, 2010 at 8:53 PM, Brandon High bh...@freaks.com wrote: On Sat, Apr 24, 2010 at 5:02 PM, Leandro Vanden Bosch l.vbo...@gmail.com wrote: Confirmed then that the issue was with the WD10EARS. I swapped it out with the old one and things look a lot better: The problem with the EARS drive is that it was not 4k aligned. The solaris partition table was, but that does not take into account the fdisk MBR. As a result, everything was off by one cylinder. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Tank zpool has tanked out :(
Hi there, My zpool tank has been chugging along nicely but after a failed attempt at offlining a misbehaving drive I've got a wierd sitation. pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAMESTATE READ WRITE CKSUM tankDEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 OFFLINE 0 0 0 errors: No known data errors Why does that particular drive appear twice? I am on SNV_134. They are 6x500Gb Western Digital RE drives. I have a spare 500Gb on another controller (c0t0d0) which I want to use to keep replace the (probably dying) drive but I'm not sure I can do this and it will correctly remove the one I want: # zpool replace tank c7t3d0 c0t0d0 Any ideas? Gracias, Andre _ New, Used, Demo, Dealer or Private? Find it at CarPoint.com.au http://clk.atdmt.com/NMN/go/206222968/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very serious performance degradation
...and let the resilver complete. -- richard Hi ! pool: zfs_raid state: ONLINE scrub: resilver completed after 16h34m with 0 errors on Fri May 21 05:39:42 2010 config: NAMESTATE READ WRITE CKSUM zfs_raidONLINE 0 0 0 raidz1ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 326G resilvered errors: No known data errors Now, I just have to do the same drive replacement for the 2 other failing drives... Many thanks to you all ! Philippe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very serious performance degradation
Now, I just have to do the same drive replacement for the 2 other failing drives... For information, current iostat results : extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.00.00.00.0 0.0 0.00.00.0 0 0 0 11 0 11 c8t0d0 44.90.0 5738.10.0 0.0 1.40.0 32.2 0 5 0 0 0 0 c7t0d0 0.4 241.93.2 1172.2 0.0 2.70.0 11.1 0 10 0 0 0 0 c7t2d0 0.4 31.23.2 846.2 0.0 28.20.0 891.2 0 89 0 0 0 0 c7t3d0 0.3 18.51.2 576.3 0.0 7.50.0 398.4 0 24 0 0 0 0 c7t4d0 0.3 38.42.5 1289.1 0.0 0.80.0 19.6 0 4 0 0 0 0 c7t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.00.00.00.0 0.0 0.00.00.0 0 0 0 11 0 11 c8t0d0 0.00.03.20.0 0.0 0.00.04.2 0 0 0 0 0 0 c7t0d0 0.00.00.10.0 0.0 0.00.09.7 0 0 0 0 0 0 c7t2d0 0.0 27.60.1 701.3 0.0 35.00.0 1269.2 0 100 0 0 0 0 c7t3d0 0.0 19.60.1 713.0 0.0 20.90.0 1066.5 0 61 0 0 0 0 c7t4d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c7t5d0 I really have to hurry up replacing c7t3d0 too (and next c7t4d0), cause they are falling rapidly (compared to the results of yesterday) !! Have a nice week-end, Philippe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
On the PCIe side, I noticed there's a new card coming from LSI that claims 150,000 4k random writes. Unfortunately this might end up being an OEM-only card. I also notice on the ddrdrive site that they now have an opensolaris driver and are offering it in a beta program. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very serious performance degradation
Hi, I known that ZFS is aware of I/O errors, and can alert or disable a crappy disk. However, ZFS didn't notice at all these service time problems. I think it is a good idea to integrate service time triggers in ZFS ! What to you think ? Best regards ! Philippe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
If you do not care about this NFS problem (or the others) then maybe you can just disable the ZIL. It is a matter of working through step 1. Working through STEP 1 might be ``doesn't affect us. Disable ZIL.'' Or it might be ``get slog with supercap''. STEP 1 will never be ``plug in OCZ Vertex cheaposlog that ignores cacheflush'' if you are doing it right. And Step 2 has nothing to do with anything yet until we finish STEP 1 and the insane failure cases. AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. Also: http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html Another benefit of SandForce's architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity. So if I interpret them correctly, what they chose to do with the current incarnation of the architecture is actually reserve some of the primary memory capacity for I/O transaction management. In plain English, if the system gets interrupted either by power or by a crash, when it initializes the next time, it can read from its transaction space and resume where it left off. This makes it durable. So, OCZ Vertex 2 seems to be a good choice for ZIL. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very serious performance degradation
Hi, Actually, it seems a common problem with WD EARS drives (advanced format) ! Please, see this other OpenSolaris thread : https://opensolaris.org/jive/thread.jspa?threadID=126637 It is worth investigating ! I quote : Just replacing back, and here is the iostat for the new EARS drive: http://pastie.org/889572 Those asvc_t's are atrocious. As is the op/s throughput. All the other drives spend the vast majority of the time idle, waiting for the new EARS drive to write out data. This is after isolating another issue to my Dell PERC 5/i's - they apparently don't talk nicely with the EARS drives either. Streaming writes would push data for two seconds and pause for ten. Random writes ... give up. On the Intel chipset's SATA - streaming writes are acceptable, but random writes are as per the above url. Format tells me that the partition starts at sector 256.But given that ZFS writes variable size blocks, that really shouldn't matter. When plugged the EARS into a P45-based motherboard running Windows, HDTune presents a normal looking streaming writes graph, and the average seek time is 14ms - the drive seems healthy. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
On May 20, 2010, at 7:17 PM, Ragnar Sundblad ra...@csc.kth.se wrote: On 21 maj 2010, at 00.53, Ross Walker wrote: On May 20, 2010, at 6:25 PM, Travis Tabbal tra...@tabbal.net wrote: use a slog at all if it's not durable? You should disable the ZIL instead. This is basically where I was going. There only seems to be one SSD that is considered working, the Zeus IOPS. Even if I had the money, I can't buy it. As my application is a home server, not a datacenter, things like NFS breaking if I don't reboot the clients is a non-issue. As long as the on-disk data is consistent so I don't have to worry about the entire pool going belly-up, I'm happy enough. I might lose 30 seconds of data, worst case, as a result of running without ZIL. Considering that I can't buy a proper ZIL at a cost I can afford, and an improper ZIL is not worth much, I don't see a reason to bother with ZIL at all. I'll just get a cheap large SSD for L2ARC, disable ZIL, and call it a day. For my use, I'd want a device in the $200 range to even consider an slog device. As nothing even remotely close to that price range exists that will work properly at all, let alone with decent performance, I see no point in ZIL for my application. The performance hit is just too severe to continue using it without an slog, and there's no slog device I can afford that works properly, even if I ignore performance. Just buy a caching RAID controller and run it in JBOD mode and have the ZIL integrated with the pool. A 512MB-1024MB card with battery backup should do the trick. It might not have the capacity of an SSD, but in my experience it works well in the 1TB data moderately loaded range. Have more data/activity then try more cards and more pools, otherwise pony up the for a capacitor backed SSD. It - again - depends on what problem you are trying to solve. If the RAID controller goes bad on you so that you loose the data in the write cache, your file system could be in pretty bad shape. Most RAID controllers can't be mirrored. That would hardly make a good replacement for a mirrored ZIL. As far as I know, there is no single silver bullet to this issue. That is true, and there at finite budgets as well and as all things in life one must make a trade-off somewhere. If you have 2 mirrored SSDs that don't support cache flush and your power goes out your file system will be in the same bad shape. Difference is in the first place you paid a lot less to have your data hosed. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. Also: http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html Another benefit of SandForce's architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity. So if I interpret them correctly, what they chose to do with the current incarnation of the architecture is actually reserve some of the primary memory capacity for I/O transaction management. In plain English, if the system gets interrupted either by power or by a crash, when it initializes the next time, it can read from its transaction space and resume where it left off. This makes it durable. Here is a detailed explanation of the SandForce controllers: http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal So the SF-1500 is enterprise class and relies on a supercap, the SF-1200 is consumer class and does not rely on a supercap. The SF-1200 firmware on the other hand doesn’t assume the presence of a large capacitor to keep the controller/NAND powered long enough to complete all writes in the event of a power failure. As such it does more frequent check pointing and doesn’t guarantee the write in progress will complete before it’s acknowledged. As I understand it, the SF-1200 will ack the sync write only after it is written to flash thus reducing write performance. There is an interesting part about firmwares and OCZ having an exclusive firmware in the Vertex 2 series which based on the SF-1200 but its random write IOPS is not capped at 10K (while other vendors and other SSDs from OCZ using the SF-1200 are capped, unless they sell the drive with the RC firmware which is for OEM evaluation and not production ready but does not contain the IOPS cap). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
On Thu, May 20, 2010 19:44, Freddie Cash wrote: And you can always patch OpenSSH with HPN, thus enabling the NONE cipher, which disable encryption for the data transfer (authentication is always encrypted). And twiddle the internal buffers that OpenSSH uses to improve transfer rates, especially on 100 Mbps or faster links. Ah! I've been wanting that for YEARS. Very glad to hear somebody has done it. With the common use of SSH for for moving bulk data (under rsync as well), this is a really useful idea. Of course one should think about where one is moving one's data unencrypted; but the precise cases where the performance hit of encryption will show are the safe ones, such as between my desktop and server which are plugged into the same switch; no data would leave that small LAN segment. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
SNIP a whole lot of ZIL/SLOG discussion Hi guys. yep I know about the ZIL, and SSD Slogs. While setting Nextenta up it offered to disable the ZIL entirely. For now I left it on. In the end (hopefully for only specifc filesystems - once that feature is released.) I'll end up disabling the ZIL for our software builds since: 1) The builds are disposable - We only need to save them if they finish, and we can restart them if needed. 2) The build servers are not on UPS so a power failure is likely to make the clients lose all state and need to restart anyway. But, This issue I've seen with Nexenta, is not due to the ZIL. It runs until it literally crashes the machine. It's not just slow, It brings the machine to it's knees. I beleive it does have something to do with exhausting memory though. As Erast says it maybe the IPS driver (though I've used that on b130 of SXCE without issues,) or who knows what else. I did download some updates from Nexenta yesterday. I'm going to try to retest today or tomorrow. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, Miika Vesti wrote: AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. So, OCZ Vertex 2 seems to be a good choice for ZIL. There seem to be quite a lot of blind assumptions in the above. The only good choice for ZIL is when you know for a certainty and not assumptions based on 3rd party articles and blog postings. Otherwise it is like assuming that if you jump through an open window that there will be firemen down below to catch you. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace lockup / replace process now stalled, how to fix?
For the record, in case anyone else experiences this behaviour: I tried various things which failed, and finally as a last ditch effort, upgraded my freebsd, giving me zpool v14 rather than v13 - and now it's resilvering as it should. Michael On Monday 17 May 2010 09:26:23 Michael Donaghy wrote: Hi, I recently moved to a freebsd/zfs system for the sake of data integrity, after losing my data on linux. I've now had my first hard disk failure; the bios refused to even boot with the failed drive (ad18) connected, so I removed it. I have another drive, ad16, which had enough space to replace the failed one, so I partitioned it and attempted to use zpool replace to replace the failed partitions for new ones, i.e. zpool replace tank ad18s1d ad16s4d. This seemed to simply hang, with no processor or disk use; any zpool status commands also hung. Eventually I attempted to reboot the system, which also eventually hung; after waiting a while, having no other option, rightly or wrongly, I hard-rebooted. Exactly the same behaviour happened with the other zpool replace. Now, my zpool status looks like: arcueid ~ $ zpool status pool: tank state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 ad4s1d ONLINE 0 0 0 ad6s1d ONLINE 0 0 0 ad9s1d ONLINE 0 0 0 ad17s1dONLINE 0 0 0 replacing DEGRADED 0 0 0 ad18s1d UNAVAIL 0 9.62K 0 cannot open ad16s4d ONLINE 0 0 0 ad20s1dONLINE 0 0 0 raidz2 DEGRADED 0 0 0 ad4s1e ONLINE 0 0 0 ad6s1e ONLINE 0 0 0 ad17s1eONLINE 0 0 0 replacing DEGRADED 0 0 0 ad18s1e UNAVAIL 0 11.2K 0 cannot open ad16s4e ONLINE 0 0 0 ad20s1eONLINE 0 0 0 errors: No known data errors It looks like the replace has taken in some sense, but ZFS doesn't seem to be resilvering as it should. Attempting to zpool offline doesn't work: arcueid ~ # zpool offline tank ad18s1d cannot offline ad18s1d: no valid replicas Attempting to scrub causes a similar hang to before. Data is still readable (from the zvol which is the only thing actually on this filesystem), although slowly. What should I do to recover this / trigger a proper replace of the failed partitions? Many thanks, Michael ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, May 21, 2010 10:19, Bob Friesenhahn wrote: On Fri, 21 May 2010, Miika Vesti wrote: AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. So, OCZ Vertex 2 seems to be a good choice for ZIL. There seem to be quite a lot of blind assumptions in the above. The only good choice for ZIL is when you know for a certainty and not assumptions based on 3rd party articles and blog postings. Otherwise it is like assuming that if you jump through an open window that there will be firemen down below to catch you. Just how DOES one know something for a certainty, anyway? I've seen LOTS of people mess up performance testing in ways that gave them very wrong answers; relying solely on your own testing is as foolish as relying on a couple of random blog posts. To be comfortable (I don't ask for know for a certainty; I'm not sure that exists outside of faith), I want a claim by the manufacturer and multiple outside tests in significant journals -- which could be the blog of somebody I trusted, as well as actual magazines and such. Ideally, certainly if it's important, I'd then verify the tests myself. There aren't enough hours in the day, so I often get by with less. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
This is intresting. I thought all Vertex 2 SSDs are good choices for ZIL but this does not seem to be the case. According to http://www.legitreviews.com/article/1208/1/ Vertex 2 LE, Vertex 2 Pro and Vertex 2 EX are SF-1500 based but Vertex 2 (without any suffix) is SF-1200 based. Here is the table: ModelController Max Read Max Write IOPS Vertex 2 SF-1200270MB/s 260MB/s 9500 Vertex 2 LE SF-1500270MB/s 250MB/s ? Vertex 2 Pro SF-1500280MB/s 270MB/s 19000 Vertex 2 EX SF-1500280MB/s 270MB/s 25000 21.05.2010 17:09, Attila Mravik kirjoitti: AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. Also: http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html Another benefit of SandForce's architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity. So if I interpret them correctly, what they chose to do with the current incarnation of the architecture is actually reserve some of the primary memory capacity for I/O transaction management. In plain English, if the system gets interrupted either by power or by a crash, when it initializes the next time, it can read from its transaction space and resume where it left off. This makes it durable. Here is a detailed explanation of the SandForce controllers: http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal So the SF-1500 is enterprise class and relies on a supercap, the SF-1200 is consumer class and does not rely on a supercap. The SF-1200 firmware on the other hand doesn’t assume the presence of a large capacitor to keep the controller/NAND powered long enough to complete all writes in the event of a power failure. As such it does more frequent check pointing and doesn’t guarantee the write in progress will complete before it’s acknowledged. As I understand it, the SF-1200 will ack the sync write only after it is written to flash thus reducing write performance. There is an interesting part about firmwares and OCZ having an exclusive firmware in the Vertex 2 series which based on the SF-1200 but its random write IOPS is not capped at 10K (while other vendors and other SSDs from OCZ using the SF-1200 are capped, unless they sell the drive with the RC firmware which is for OEM evaluation and not production ready but does not contain the IOPS cap). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tank zpool has tanked out :(
Andreas, Does the pool tank actually have 6 disks c7t0-c7t5 and c7t3d0 is now masking c7t5d0 or it is a 5-disk configuration with c7t5 repeated twice? If it is the first case (c7t0-c7t5), then I would check how these devices are connected before attempting to replace the c7t3d0 disk. What does the format utility display for these devices? I haven't seen this error but others on this list have resolved this problem by exporting and importing the pool. Always have good backups of your data. Thanks, Cindy On 05/21/10 03:26, Andreas Iannou wrote: Hi there, My zpool tank has been chugging along nicely but after a failed attempt at offlining a misbehaving drive I've got a wierd sitation. pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAMESTATE READ WRITE CKSUM tankDEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 *c7t3d0 ONLINE 0 0 0* c7t2d0 ONLINE 0 0 0 *c7t3d0 OFFLINE 0 0 0* errors: No known data errors Why does that particular drive appear twice? I am on SNV_134. They are 6x500Gb Western Digital RE drives. I have a spare 500Gb on another controller (c0t0d0) which I want to use to keep replace the (probably dying) drive but I'm not sure I can do this and it will correctly remove the one I want: # zpool replace tank c7t3d0 c0t0d0 Any ideas? Gracias, Andre Find it at CarPoint.com.au New, Used, Demo, Dealer or Private? http://clk.atdmt.com/NMN/go/206222968/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
On Fri, May 21, 2010 at 7:12 AM, David Dyer-Bennet d...@dd-b.net wrote: On Thu, May 20, 2010 19:44, Freddie Cash wrote: And you can always patch OpenSSH with HPN, thus enabling the NONE cipher, which disable encryption for the data transfer (authentication is always encrypted). And twiddle the internal buffers that OpenSSH uses to improve transfer rates, especially on 100 Mbps or faster links. Ah! I've been wanting that for YEARS. Very glad to hear somebody has done it. ssh-1 has had the 'none' cipher from day one, though it looks like openssh has removed it at some point. Fixing the buffers seems to be a nice tweak though. With the common use of SSH for for moving bulk data (under rsync as well), this is a really useful idea. Of course one should think about where one I think there's a certain assumption that using ssh = safe, and by enabling a none cipher you break that assumption. All of us know better, but less experienced admins may not. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
On Fri, May 21, 2010 at 10:59 AM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 7:12 AM, David Dyer-Bennet d...@dd-b.net wrote: On Thu, May 20, 2010 19:44, Freddie Cash wrote: And you can always patch OpenSSH with HPN, thus enabling the NONE cipher, which disable encryption for the data transfer (authentication is always encrypted). And twiddle the internal buffers that OpenSSH uses to improve transfer rates, especially on 100 Mbps or faster links. Ah! I've been wanting that for YEARS. Very glad to hear somebody has done it. ssh-1 has had the 'none' cipher from day one, though it looks like openssh has removed it at some point. Correct. It was available in early OpenSSH version, but then removed as it could compromise security. And the OpenSSH devs continue to reject any patches that re-enable the none cipher for this reason. Fixing the buffers seems to be a nice tweak though. Yes, this really makes a difference. We were initially bottlenecked by SSH (100-200 Mbps) for our rsync connections (gigabit fibre between buildings) between two FreeBSD servers (low CPU use, medium drive I/O). Bumping the buffers to 16384 on each side increased it to over 500 Mbps (now limited by CPU). We've since dropped it to 4096, as we have a lot of non-HPN-enabled remote sites we need to rysnc from, and anything over 4096 causes the connection to drop (remote end can't keep up). With the common use of SSH for for moving bulk data (under rsync as well), this is a really useful idea. Of course one should think about where one I think there's a certain assumption that using ssh = safe, and by enabling a none cipher you break that assumption. All of us know better, but less experienced admins may not. That's the gist of the OpenSSH devs' reasoning for rejecting the HPN patches everytime they are submitted. :) -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
On Fri, May 21, 2010 12:59, Brandon High wrote: On Fri, May 21, 2010 at 7:12 AM, David Dyer-Bennet d...@dd-b.net wrote: On Thu, May 20, 2010 19:44, Freddie Cash wrote: And you can always patch OpenSSH with HPN, thus enabling the NONE cipher, which disable encryption for the data transfer (authentication is always encrypted). And twiddle the internal buffers that OpenSSH uses to improve transfer rates, especially on 100 Mbps or faster links. Ah! I've been wanting that for YEARS. Very glad to hear somebody has done it. ssh-1 has had the 'none' cipher from day one, though it looks like openssh has removed it at some point. Fixing the buffers seems to be a nice tweak though. I thought I remembered a none cipher, but couldn't find it the other year and decided I must have been wrong. I did use ssh-1, so maybe I really WAS remembering after all. With the common use of SSH for for moving bulk data (under rsync as well), this is a really useful idea. Of course one should think about where one I think there's a certain assumption that using ssh = safe, and by enabling a none cipher you break that assumption. All of us know better, but less experienced admins may not. Seems a high price to pay to try to protect idiots from being idiots. Anybody who doesn't understand that encryption = none means it's not encrypted and hence not safe isn't safe as an admin anyway. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Thu, May 20, 2010 at 2:23 PM, Miika Vesti miika.ve...@trivore.com wrote: I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. I've read conflicting reports that the controller contains a small DRAM cache. So while it doesn't rely on an external DRAM cache, it does have one: http://www.legitreviews.com/article/1299/2/ As we noted, the Vertex 2 doesn't have any cache chips on it as that is because the SandForce controller itself is said to carry a small cache inside that is a number of megabytes in size. Another benefit of SandForce's architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity. Again, conflicting reports indicate otherwise. http://www.legitreviews.com/article/1299/2/ That adds up to 128GB of storage space, but only 93.1GB of it will be usable space! The 'hidden' capacity is used for wear leveling, which is crucial to keeping SSDs running as long as possible. My understanding is that the controller contains enough cache to buffer enough data to write a complete erase block size, eliminating the need to read / erase / write that a partial block write entails. It's reported to do a copy-on-write, so it doesn't need to do a read of existing blocks when making changes, which gives it such high iops - Even random writes are turned into sequential writes (much like how ZFS works) of entire erase blocks. The excessive spare area is used to ensure that there are always full pages free to write to. (Some vendors are releasing consumer drives with 60/120/240 GB, using 7% reserved space rather than the 27% that the original drives ship with.) With an unexpected power loss, you could still lose any data that's cached in the controller, or any uncommitted changes that have been partially written to the NAND I hate having to rely on sites like Legit Reviews and Anandtech for technical data, but there don't seem to be non-fanboy sites doing comprehensive reviews of the drives ... -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
dd == David Dyer-Bennet d...@dd-b.net writes: dd Just how DOES one know something for a certainty, anyway? science. Do a test like Lutz did on X25M G2. see list archives 2010-01-10. pgpeiR4DYODbj.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
On Fri, May 21, 2010 at 11:28 AM, David Dyer-Bennet d...@dd-b.net wrote: I thought I remembered a none cipher, but couldn't find it the other year and decided I must have been wrong. I did use ssh-1, so maybe I really WAS remembering after all. It may have been in ssh2 as well, or at least the commercial version .. I thought it used to be a compile time option for openssh too. Seems a high price to pay to try to protect idiots from being idiots. Anybody who doesn't understand that encryption = none means it's not encrypted and hence not safe isn't safe as an admin anyway. Well, it won't expose your passwords since the key exchange it still encrypted ... That's good, right? Circling back to the original topic, you can use ssh to start up mbuffer on the remote side, then start the send. Something like: #!/bin/bash ssh -f r...@${recv_host} mbuffer -q -I ${SEND_HOST}:1234 | zfs recv puddle/tank sleep 1 zfs send -R tank/foo/bar | mbuffer -O ${RECV_HOST}:1234 When I was moving datasets between servers, I was on the console of both, so manually starting the send/recv was not a problem. I've tried doing it with netcat rather than mbuffer but it was painfully slow, probably due to network buffers. ncat (from the nmap devs) may be a suitable alternative, and can support ssl and certificate based auth. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Now, if someone would make a Battery FOB, that gives broken SSD 60 seconds of power, then we could use the consumer SSD's in servers again with real value instead of CYA value. You know- it would probably be sufficient to provide the SSD with _just_ a big capacitor bank. If the host lost power it would stop writing and if the SSD still had power it would probably use the idle time to flush it's buffers. Then there would be world peace! Yeah- got a little carried away there. Still this seems like an experiment I'm going to have to try on my home server out of curiosity more than anything else :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
I seem to be getting decent speed with arcfour (this was what i was using to begin with) Thanks for all the helpthis honestly was just me being stupid...looking back on yesterday, i can't even remember what i was doing wrong nowi was REALLY tired when i asked this question. On Fri, May 21, 2010 at 2:43 PM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 11:28 AM, David Dyer-Bennet d...@dd-b.net wrote: I thought I remembered a none cipher, but couldn't find it the other year and decided I must have been wrong. I did use ssh-1, so maybe I really WAS remembering after all. It may have been in ssh2 as well, or at least the commercial version .. I thought it used to be a compile time option for openssh too. Seems a high price to pay to try to protect idiots from being idiots. Anybody who doesn't understand that encryption = none means it's not encrypted and hence not safe isn't safe as an admin anyway. Well, it won't expose your passwords since the key exchange it still encrypted ... That's good, right? Circling back to the original topic, you can use ssh to start up mbuffer on the remote side, then start the send. Something like: #!/bin/bash ssh -f r...@${recv_host} mbuffer -q -I ${SEND_HOST}:1234 | zfs recv puddle/tank sleep 1 zfs send -R tank/foo/bar | mbuffer -O ${RECV_HOST}:1234 When I was moving datasets between servers, I was on the console of both, so manually starting the send/recv was not a problem. I've tried doing it with netcat rather than mbuffer but it was painfully slow, probably due to network buffers. ncat (from the nmap devs) may be a suitable alternative, and can support ssl and certificate based auth. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS no longer working with FC devices.
For years I have been running a zpool using a Fibre Channel array with no problems. I would scrub every so often and dump huge amounts of data (tens or hundreds of GB) around and it never had a problem outside of one confirmed (by the array) disk failure. I upgraded to sol10x86 05/09 last year and since then I have discovered any sufficiently high I/O from ZFS starts causing timeouts and off-lining disks. This leads to failure (once rebooted and cleaned all is well) long term because you can no longer scrub reliably. ATA, SATA and SAS do not seem to suffer this problem. I tried upgrading, and then doing a fresh load of U8 and the problem persists. My FC hardware is: Sun A5100 (14 disk) array. Hitachi 146GB FC disks (started with 9GB SUN disks, moved to 36 GB disks from a variety of manufacturers, and then to 72 GB IBM disks before this last capacity upgrade). Sun branded Qlogic 2310 FC cards (375-3102). Sun qlc drivers and MPIO is enabled. The rest of the system: 2 CPU Opteron board and chips(2GHZ), 8GB RAM. When a hard drive fails in the enclosure, it bypasses the bad drive and turns on a light to let me know a disk failure has happened. This never happens with this event, pointing it to be a software problem. Once it goes off the rails and starts off-lining disks it causes the system to have problems. Login for a user takes forever (40 minutes minimum to pass the last login message), any command touching on storage or zfs/zpool hangs for just as long. I can reliably reproduce the issue by either copying a large amount of data into the pool or running a scrub. All disks test fine via destructive tests in format. I just reproduced it by clearing and creating anew pool called share: # zpool status share pool: share state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM share ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t50050767190B6C76d0 ONLINE 0 0 0 c0t500507671908E72Bd0 ONLINE 0 0 0 c0t500507671907A32Ad0 ONLINE 0 0 0 c0t50050767190C4CFDd0 ONLINE 0 0 0 c0t500507671906704Dd0 ONLINE 0 0 0 c0t500507671918892Ad0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t50050767190D11E4d0 ONLINE 0 0 0 c0t500507671915CABEd0 ONLINE 0 0 0 c0t50050767191371C7d0 ONLINE 0 0 0 c0t5005076719125EDBd0 ONLINE 0 0 0 c0t50050767190E4DABd0 ONLINE 0 0 0 c0t5005076719147ECAd0 ONLINE 0 0 0 errors: No known data errors messages logs something like the following: May 21 15:27:54 solarisfc scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): May 21 15:27:54 solarisfc /scsi_vhci/d...@g50050767191371c7 (sd2): Command Timeout on path /p...@0,0/pci1022,7...@a/pci1077,1...@3/f...@0,0 (fp1) May 21 15:27:54 solarisfc scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/d...@g50050767191371c7 (sd2): May 21 15:27:54 solarisfc SCSI transport failed: reason 'timeout': retrying command May 21 15:27:54 solarisfc scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): May 21 15:27:54 solarisfc /scsi_vhci/d...@g50050767191371c7 (sd2): Command Timeout on path /p...@0,0/pci1022,7...@a/pci1077,1...@2/f...@0,0 (fp0) May 21 15:28:54 solarisfc scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/d...@g50050767191371c7 (sd2): May 21 15:28:54 solarisfc SCSI transport failed: reason 'timeout': giving up May 21 15:32:54 solarisfc scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/d...@g50050767191371c7 (sd2): May 21 15:32:54 solarisfc SYNCHRONIZE CACHE command failed (5) May 21 15:40:54 solarisfc scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/d...@g50050767191371c7 (sd2): May 21 15:40:54 solarisfc drive offline May 21 15:48:55 solarisfc scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/d...@g50050767191371c7 (sd2): May 21 15:48:55 solarisfc drive offline May 21 15:56:55 solarisfc scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/d...@g50050767191371c7 (sd2): May 21 15:56:55 solarisfc drive offline May 21 16:04:55 solarisfc scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/d...@g50050767191371c7 (sd2): May 21 16:04:55 solarisfc drive offline May 21 16:04:56 solarisfc fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major May 21 16:04:56 solarisfc EVENT-TIME: Fri May 21 16:04:56 EDT 2010 May 21 16:04:56 solarisfc PLATFORM: To Be Filled By O.E.M., CSN: To Be Filled By O.E.M., HOSTNAME: solarisfc May 21 16:04:56 solarisfc SOURCE: zfs-diagnosis, REV: 1.0 May 21 16:04:56 solarisfc EVENT-ID: 295d7729-9a93-47f1-de9d-ba3a08b2d477 May 21 16:04:56 solarisfc DESC: The
Re: [zfs-discuss] New SSD options
On Thu, May 20, 2010 at 8:46 PM, Don d...@blacksun.org wrote: I'm kind of flabbergasted that no one has simply stuck a capacitor on a more reasonable drive. I guess the market just isn't big enough- but I find that hard to believe. I just spoke with a co-worker about doing something about it. He says he can design a small in-line UPS that will deliver 20-30 seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50 in parts. It would be even less if only one voltage was needed. That should be enough for most any SSD to finish any pending writes. Any design that we come up with will be made publicly available under a Creative Commons or other similar license. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
I just spoke with a co-worker about doing something about it. He says he can design a small in-line UPS that will deliver 20-30 seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50 in parts. It would be even less if only one voltage was needed. That should be enough for most any SSD to finish any pending writes. Oh I wasn't kidding when I said I was going to have to try this with my home server. I actually do some circuit board design and this would be an amusing project. All you probably need is 5v- I'll look into it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
On 05/22/10 12:31 PM, Don wrote: I just spoke with a co-worker about doing something about it. He says he can design a small in-line UPS that will deliver 20-30 seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50 in parts. It would be even less if only one voltage was needed. That should be enough for most any SSD to finish any pending writes. Oh I wasn't kidding when I said I was going to have to try this with my home server. I actually do some circuit board design and this would be an amusing project. All you probably need is 5v- I'll look into it. Two Supercaps should do the trick. Dive connectors only have 5 and 12v. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tank zpool has tanked out :(
Hello Cindy, Does the pool tank actually have 6 disks c7t0-c7t5 and c7t3d0 is now masking c7t5d0 or it is a 5-disk configuration with c7t5 repeated twice? There are 6 disks connected onto the onboard Intel SATA controller (this is a home NAS). There are another four that represent 2x for the rpool and 2x for another zpool for backup snapshots from our Macs. The thing is that I removed the motherboard and everything from the case to see which drive was causing issues, I may have swapped the order of the drives around and confused ZFS. What does the format utility display for these devices? Format presents it correctly. 4. c7t0d0 /p...@0,0/pci1458,b...@11/d...@0,0 5. c7t1d0 /p...@0,0/pci1458,b...@11/d...@1,0 6. c7t2d0 /p...@0,0/pci1458,b...@11/d...@2,0 7. c7t3d0 /p...@0,0/pci1458,b...@11/d...@3,0 8. c7t4d0 /p...@0,0/pci1458,b...@11/d...@4,0 9. c7t5d0 /p...@0,0/pci1458,b...@11/d...@5,0 others on this list have resolved this problem by exporting and importing the pool. Can you still export a pool when a disk is offline? Cheers, Andre Date: Fri, 21 May 2010 11:51:13 -0600 From: cindy.swearin...@oracle.com To: andreas_wants_the_w...@hotmail.com CC: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Tank zpool has tanked out :( Andreas, Does the pool tank actually have 6 disks c7t0-c7t5 and c7t3d0 is now masking c7t5d0 or it is a 5-disk configuration with c7t5 repeated twice? If it is the first case (c7t0-c7t5), then I would check how these devices are connected before attempting to replace the c7t3d0 disk. What does the format utility display for these devices? I haven't seen this error but others on this list have resolved this problem by exporting and importing the pool. Always have good backups of your data. Thanks, Cindy _ If It Exists, You'll Find it on SEEK. Australia's #1 job site http://clk.atdmt.com/NMN/go/157639755/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
here ya go (sorry for the late reply) wonsl...@wonslung-raidz2:~$ kstat -m cpu_info -c misc module: cpu_infoinstance: 0 name: cpu_info0 class:misc brand AMD Opteron(tm) Processor 6128 cache_id0 chip_id 0 clock_MHz 2000 clog_id 0 core_id 0 cpu_typei386 crtime 9168.3602694 current_clock_Hz20 current_cstate 1 family 16 fpu_typei387 compatible implementation x86 (chipid 0x0 AuthenticAMD 100F91 family 16 model 9 step 1 clock 2000 MHz) model 9 ncore_per_chip 8 ncpu_per_chip 8 pg_id 3 pkg_core_id 0 snaptime113230.79067 socket_type G34 state on-line state_begin 1274377642 stepping1 supported_frequencies_Hz 8:10:12:15:20 supported_max_cstates 0 vendor_id AuthenticAMD module: cpu_infoinstance: 1 name: cpu_info1 class:misc brand AMD Opteron(tm) Processor 6128 cache_id1 chip_id 0 clock_MHz 2000 clog_id 1 core_id 1 cpu_typei386 crtime 9171.356087394 current_clock_Hz20 current_cstate 1 family 16 fpu_typei387 compatible implementation x86 (chipid 0x0 AuthenticAMD 100F91 family 16 model 9 step 1 clock 2000 MHz) model 9 ncore_per_chip 8 ncpu_per_chip 8 pg_id 4 pkg_core_id 1 snaptime113230.734042092 socket_type G34 state on-line state_begin 1274377645 stepping1 supported_frequencies_Hz 8:10:12:15:20 supported_max_cstates 0 vendor_id AuthenticAMD module: cpu_infoinstance: 2 name: cpu_info2 class:misc brand AMD Opteron(tm) Processor 6128 cache_id2 chip_id 0 clock_MHz 2000 clog_id 2 core_id 2 cpu_typei386 crtime 9171.410218363 current_clock_Hz8 current_cstate 0 family 16 fpu_typei387 compatible implementation x86 (chipid 0x0 AuthenticAMD 100F91 family 16 model 9 step 1 clock 2000 MHz) model 9 ncore_per_chip 8 ncpu_per_chip 8 pg_id 5 pkg_core_id 2 snaptime113230.734600041 socket_type G34 state on-line state_begin 1274377645 stepping1 supported_frequencies_Hz 8:10:12:15:20 supported_max_cstates 0 vendor_id AuthenticAMD module: cpu_infoinstance: 3 name: cpu_info3 class:misc brand AMD Opteron(tm) Processor 6128 cache_id3 chip_id 0 clock_MHz 2000 clog_id 3 core_id 3 cpu_typei386 crtime 9171.440232239 current_clock_Hz20 current_cstate 0
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
Something i've been meaning to ask I'm transfering some data from my older server to my newer one. the older server has a socket 775 intel Q9550 8 gb ddr2 800 20 1TB drives in raidz2 (3 vdevs, 2 with 7 drives one with 6) connected to 3 AOC-SAT2-MV8 cards spread as evenly across them as i could The new server is socket g34 based with the opteron 6128 8 core cpu with 16 gb ddr3 1333 ECC ram with 10 2TB drives (so far) in a single raidz2 vdev connected to 3 LSI SAS3081E-R cards (flashed with IT firmware) I'm sure this is due to something i don't understand, but durring zfs send/recv from the old server to the new server (3 send/recv streams) I'm noticing the loadavg on the old server is much less than the new one this is form top on the old server: load averages: 1.58, 1.57, 1.37; up 5+05:13:17 04:52:42 and this is the newer server load averages: 6.20, 5.98, 5.30; up 1+05:03:02 18:49:57 shouldn't the newer server have LESS load? Please forgive my ubernoobness. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
On Fri, May 21, 2010 at 5:31 PM, Don d...@blacksun.org wrote: Oh I wasn't kidding when I said I was going to have to try this with my home server. I actually do some circuit board design and this would be an amusing project. All you probably need is 5v- I'll look into it. The SATA power connector supplies 3.3, 5 and 12v. A complete solution will have all three. Most drives use just the 5v, so you can probably ignore 3.3v and 12v. You'll need to use a step up DC-DC converter and be able to supply ~ 100mA at 5v. (I can't find any specific numbers on power consumption. Intel claims 75mW - 150mW for the X25-M. USB is rated at 500mA at 5v, and all drives that I've seen can run in an un-powered USB case.) It's actually easier/cheaper to use a LiPoly battery charger and get a few minutes of power than to use an ultracap for a few seconds of power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you'll need a step up converter in either case. If you're supplying more than one voltage, you should use a microcontroller to shut off all the charge pumps at once when the battery / ultracap runs low. If you're only supplying 5V, it doesn't matter. Cost for a 5v only system should be $30 - $35 in one-off prototype-ready components with a 1100mAH battery (using prices from Sparkfun.com), plus the cost for an enclosure, etc. A larger buy, a custom PCB, and a smaller battery would probably reduce the cost 20-50%. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
On 05/22/10 12:54 PM, Thomas Burgess wrote: Something i've been meaning to ask I'm transfering some data from my older server to my newer one. the older server has a socket 775 intel Q9550 8 gb ddr2 800 20 1TB drives in raidz2 (3 vdevs, 2 with 7 drives one with 6) connected to 3 AOC-SAT2-MV8 cards spread as evenly across them as i could The new server is socket g34 based with the opteron 6128 8 core cpu with 16 gb ddr3 1333 ECC ram with 10 2TB drives (so far) in a single raidz2 vdev connected to 3 LSI SAS3081E-R cards (flashed with IT firmware) I'm sure this is due to something i don't understand, but durring zfs send/recv from the old server to the new server (3 send/recv streams) I'm noticing the loadavg on the old server is much less than the new one this is form top on the old server: load averages: 1.58, 1.57, 1.37; up 5+05:13:17 04:52:42 and this is the newer server load averages: 6.20, 5.98, 5.30; up 1+05:03:02 18:49:57 shouldn't the newer server have LESS load? Do you have compression on? Compressing is more CPU intensive than uncompressing. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
On Fri, May 21, 2010 at 5:54 PM, Thomas Burgess wonsl...@gmail.com wrote: shouldn't the newer server have LESS load? Please forgive my ubernoobness. Depends on what it's doing! Load average is really how many process are waiting to run, so it's not always a useful metric. If there are processes waiting on disk, you can have high load with almost no cpu use. Check the iowait with iostat or top. You've got a pretty wide stripe, which isn't going to give the best performance, especially for random write workloads. Your old 3 vdev config will have better random write performance. Check to see what's using the CPU with top or prstat. prstat gives better info for threads, imo. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tank zpool has tanked out :(
On Fri, May 21, 2010 at 5:39 PM, Andreas Iannou andreas_wants_the_w...@hotmail.com wrote: Can you still export a pool when a disk is offline? You can try booting from a live CD and doing 'zpool import -f', then export it. That may sort things out. You may also need to remove /etc/zfs/zpool.cache from your BE, but I'm not sure. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
is 3 zfs recv's random? On Fri, May 21, 2010 at 10:03 PM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 5:54 PM, Thomas Burgess wonsl...@gmail.com wrote: shouldn't the newer server have LESS load? Please forgive my ubernoobness. Depends on what it's doing! Load average is really how many process are waiting to run, so it's not always a useful metric. If there are processes waiting on disk, you can have high load with almost no cpu use. Check the iowait with iostat or top. You've got a pretty wide stripe, which isn't going to give the best performance, especially for random write workloads. Your old 3 vdev config will have better random write performance. Check to see what's using the CPU with top or prstat. prstat gives better info for threads, imo. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
yeah, i'm aware of the performance aspects. I use these servers as mostly hd video servers for my house...they don't need to perform amazingly. I originally went with the setup on the old server because of everything i had read about performance with wide stripes...in all honesty it performed amazingly well, much more than i truly need...i plan to have 2 raidz2 stripes of 10 drives in this server (new one). At most it will be serving 4-5 HD streams (mostly 720p mkv files, with some 1080p as well) The older server can EASILY max out 2 Gb/s links..i imagine the new server will be able to do this as well...i think a scrub of the old server takes 4-5 hours.i'm not sure what this equates to in MB/s but its WAY more than i ever really need. This is what led me to use wider stripes in the new server, and i'm honestly considering redoing the old server as well, if i switched to 2 wider stripes instead of 3 i'd gain another TB or twofor my use i don't think that would be a horrible thing. On Fri, May 21, 2010 at 10:03 PM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 5:54 PM, Thomas Burgess wonsl...@gmail.com wrote: shouldn't the newer server have LESS load? Please forgive my ubernoobness. Depends on what it's doing! Load average is really how many process are waiting to run, so it's not always a useful metric. If there are processes waiting on disk, you can have high load with almost no cpu use. Check the iowait with iostat or top. You've got a pretty wide stripe, which isn't going to give the best performance, especially for random write workloads. Your old 3 vdev config will have better random write performance. Check to see what's using the CPU with top or prstat. prstat gives better info for threads, imo. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
On Fri, May 21, 2010 at 7:57 PM, Thomas Burgess wonsl...@gmail.com wrote: is 3 zfs recv's random? It might be. What do a few reports of 'iostat -xcn 30' look like? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
I can't tell you for sure For some reason the server lost power and it's taking forever to come back up. (i'm really not sure what happened) anyways, this leads me to my next couple questions: Is there any way to resume a zfs send/recv Why is it taking so long for the server to come up? it's stuck on Reading ZFS config and there is a FLURRY of hard drive lights blinking (all 10 in sync ) On Sat, May 22, 2010 at 12:26 AM, Brandon High bh...@freaks.com wrote: On Fri, May 21, 2010 at 7:57 PM, Thomas Burgess wonsl...@gmail.com wrote: is 3 zfs recv's random? It might be. What do a few reports of 'iostat -xcn 30' look like? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
On 05/22/10 04:44 PM, Thomas Burgess wrote: I can't tell you for sure For some reason the server lost power and it's taking forever to come back up. (i'm really not sure what happened) anyways, this leads me to my next couple questions: Is there any way to resume a zfs send/recv Nope. Why is it taking so long for the server to come up? it's stuck on Reading ZFS config and there is a FLURRY of hard drive lights blinking (all 10 in sync ) It's cleaning up the mess. If you had a lot of data copied over, it'll take a while deleting it! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
yah, it seems that rsync is faster for what i need anywaysat least right now... On Sat, May 22, 2010 at 1:07 AM, Ian Collins i...@ianshome.com wrote: On 05/22/10 04:44 PM, Thomas Burgess wrote: I can't tell you for sure For some reason the server lost power and it's taking forever to come back up. (i'm really not sure what happened) anyways, this leads me to my next couple questions: Is there any way to resume a zfs send/recv Nope. Why is it taking so long for the server to come up? it's stuck on Reading ZFS config and there is a FLURRY of hard drive lights blinking (all 10 in sync ) It's cleaning up the mess. If you had a lot of data copied over, it'll take a while deleting it! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.0 983.9 0 0 fd0 3.05.5 152.2 67.8 0.0 0.05.01.2 1 1 c8t1d0 41.33.4 1288.3 69.1 0.1 0.12.73.3 3 8 c5t0d0 41.43.6 1288.0 69.0 0.1 0.22.03.4 3 7 c4t0d0 41.43.5 1287.4 69.1 0.1 0.12.63.3 3 8 c4t1d0 41.33.5 1289.1 69.1 0.1 0.22.13.6 3 7 c5t1d0 40.93.6 1267.8 66.6 0.1 0.12.23.0 3 7 c4t2d0 40.73.5 1269.1 66.6 0.1 0.12.43.2 3 7 c5t2d0 40.93.6 1267.9 66.6 0.1 0.12.43.1 3 7 c4t3d0 41.33.5 1288.6 69.0 0.1 0.22.13.5 3 7 c5t3d0 28.03.3 1060.1 60.3 0.0 0.11.22.4 1 4 c4t5d0 27.83.2 1053.8 60.3 0.0 0.11.22.5 1 4 c5t4d0 40.83.5 1268.8 66.6 0.1 0.12.33.1 3 7 c5t5d0 28.13.1 1067.1 60.3 0.0 0.11.32.5 1 4 c5t6d0 28.23.1 1072.2 60.3 0.0 0.11.22.5 1 4 c5t7d0 40.73.6 1268.3 66.6 0.1 0.12.03.3 3 7 c6t0d0 41.33.5 1288.6 69.1 0.1 0.12.63.3 3 8 c6t1d0 40.73.5 1269.2 66.6 0.1 0.12.53.2 3 8 c6t2d0 41.33.6 1288.7 69.0 0.1 0.22.13.5 3 7 c6t3d0 40.83.5 1268.8 66.6 0.1 0.12.43.1 3 7 c6t4d0 29.03.2 .5 60.3 0.0 0.11.32.5 1 4 c6t5d0 28.93.2 1112.6 60.3 0.0 0.11.32.6 1 4 c6t6d0 2.4 15.8 57.0 1962.8 0.2 0.0 10.01.8 3 3 c8t5d0 0.00.00.00.0 0.0 0.00.00.2 0 0 c4t7d0 cpu us sy wt id 40 32 0 28 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 fd0 1.82.3 13.7 34.7 0.0 0.03.00.9 0 0 c8t1d0 106.12.4 5636.1 101.4 0.5 0.24.31.5 13 16 c5t0d0 107.02.5 5607.5 101.4 0.3 0.22.52.0 8 14 c4t0d0 109.62.4 5625.9 101.3 0.4 0.13.41.3 11 14 c4t1d0 103.92.4 5665.5 101.3 0.3 0.32.92.4 9 15 c5t1d0 111.02.8 5646.9 100.7 0.4 0.13.31.3 11 15 c4t2d0 105.02.9 5700.7 100.7 0.5 0.25.01.6 14 17 c5t2d0 109.22.8 5635.1 100.6 0.4 0.13.61.3 12 15 c4t3d0 104.62.4 5676.6 101.4 0.3 0.32.92.4 8 15 c5t3d0 51.92.3 3622.1 90.0 0.1 0.21.53.4 3 8 c4t5d0 54.32.0 3696.0 90.0 0.3 0.14.41.5 7 8 c5t4d0 106.62.8 5679.7 100.6 0.3 0.32.82.8 8 16 c5t5d0 93.32.1 6861.6 89.7 0.5 0.14.91.6 12 15 c5t6d0 85.32.1 6186.8 89.9 0.3 0.23.42.8 7 14 c5t7d0 106.22.6 5678.9 100.6 0.3 0.32.82.4 8 14 c6t0d0 104.22.2 5674.1 101.3 0.3 0.33.12.9 9 16 c6t1d0 104.62.9 5655.1 100.7 0.3 0.33.22.8 9 16 c6t2d0 104.72.4 5660.9 101.4 0.3 0.33.02.4 9 15 c6t3d0 106.22.8 5691.6 100.7 0.5 0.24.11.4 12 16 c6t4d0 60.02.1 3987.1 89.9 0.2 0.13.91.4 7 9 c6t5d0 64.92.2 4581.7 89.9 0.0 0.30.64.8 2 10 c6t6d0 1.5 177.6 38.3 22270.8 2.6 0.4 14.52.2 36 39 c8t5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c4t7d0 cpu us sy wt id 41 33 0 26 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 fd0 1.54.2 10.2 65.9 0.0 0.04.71.1 1 1 c8t1d0 108.22.3 5851.0 104.4 0.5 0.24.61.6 14 18 c5t0d0 110.52.4 5851.0 104.4 0.0 0.50.04.7 0 15 c4t0d0 114.52.3 5828.3 104.4 0.4 0.23.71.4 12 16 c4t1d0 106.12.5 5874.3 104.5 0.0 0.60.05.4 0 16 c5t1d0 111.92.6 5937.4 101.9 0.4 0.23.71.4 12 16 c4t2d0 107.02.7 5958.2 101.8 0.5 0.24.61.6 14 18 c5t2d0 112.42.9 5960.2 102.0 0.4 0.23.71.4 11 16 c4t3d0 105.02.4 5890.0 104.4 0.0 0.60.05.5 0 17 c5t3d0 64.22.1 4534.2 91.2 0.1 0.21.83.6 4 10 c4t5d0 42.42.2 2964.4 91.2 0.2 0.14.03.3 4 8 c5t4d0 107.32.7 5937.3 102.0 0.2 0.42.03.9 6 17 c5t5d0 88.32.0 6630.2 91.1 0.5 0.25.51.7 12 15 c5t6d0 82.51.9 6108.0 90.9 0.0 0.50.06.3 0 13 c5t7d0 106.22.6 5965.5 101.8 0.0 0.60.05.5 0 16 c6t0d0 108.42.4 5891.2 104.5 0.0 0.70.06.5 0 19 c6t1d0 107.32.7 5977.0 102.0 0.2 0.51.94.2 6 18 c6t2d0 105.62.5 5879.0 104.4 0.0 0.60.05.7 0 17 c6t3d0 107.82.5 5949.5 101.9 0.5
Re: [zfs-discuss] New SSD options
The SATA power connector supplies 3.3, 5 and 12v. A complete solution will have all three. Most drives use just the 5v, so you can probably ignore 3.3v and 12v. I'm not interested in building something that's going to work for every possible drive config- just my config :) Both the Intel X25-e and the OCZ only uses the 5V rail. You'll need to use a step up DC-DC converter and be able to supply ~ 100mA at 5v. It's actually easier/cheaper to use a LiPoly battery charger and get a few minutes of power than to use an ultracap for a few seconds of power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you'll need a step up converter in either case. Ultracapacitors are available in voltage ratings beyond 12volts so there is no reason to use a boost converter with them. That eliminates high frequency switching transients right next to our SSD which is always helpful. In this case- we have lots of room. We have a 3.5 x 1 drive bay, but a 2.5 x 1/4 hard drive. There is ample room for several of the 6.3V ELNA 1F capacitors (and our SATA power rail is a 5V regulated rail so they should suffice)- either in series or parallel (Depending on voltage or runtime requirements). http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf You could 2 caps in series for better voltage tolerance or in parallel for longer runtimes. Either way you probably don't need a charge controller, a boost or buck converter, or in fact any IC's at all. It's just a small board with some caps on it. Cost for a 5v only system should be $30 - $35 in one-off prototype-ready components with a 1100mAH battery (using prices from Sparkfun.com), You could literally split a sata cable and add in some capacitors for just the cost of the caps themselves. The issue there is whether the caps would present too large a current drain on initial charge up- If they do then you need to add in charge controllers and you've got the same problems as with a LiPo battery- although without the shorter service life. At the end of the day the real problem is whether we believe the drives themselves will actually use the quiet period on the now dead bus to write out their caches. This is something we should ask the manufacturers, and test for ourselves. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
On 05/22/10 05:22 PM, Thomas Burgess wrote: yah, it seems that rsync is faster for what i need anywaysat least right now... ZFS send/receive should run at wire speed for a Gig-E link. Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
well it wasn't. it was running pretty slow. i had one really big filesystemwith rsync i'm able to do multiple streams and it's moving much faster On Sat, May 22, 2010 at 1:45 AM, Ian Collins i...@ianshome.com wrote: On 05/22/10 05:22 PM, Thomas Burgess wrote: yah, it seems that rsync is faster for what i need anywaysat least right now... ZFS send/receive should run at wire speed for a Gig-E link. Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss