Re: [zfs-discuss] Question on 4k sectors
On Mon, 23 Jul 2012, Anonymous Remailer (austria) wrote: The question was relative to some older boxes running S10 and not planning to upgrade the OS, keeping them alive as long as possible... Recent Solaris 10 kernel patches are addressing drives with 4k sectors. It seems that Solaris 10 will work with drives with 4k sectors so Solaris 10 users will not be stuck. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question on 4k sectors
You wrote: > 2012-07-23 18:37, Anonymous wrote: > > Really, it would be so helpful to know which drives we can buy with > > confidence and which should be avoided...is there any way to know from the > > manufacturers web sites or do you have to actually buy one and see what it > > does? Thanks to everyone for the info. > > I think that vendors' marking like 512e may give a clue on > their support of "emulated 512-byte sectors", whatever they > would mean by that for a specific model line. Yeah but buying through the mail it's awfully difficult to see the vendor markings until it's too late ;) > I believe you can roughly be certain that all 3Tb HDDs except > Hitachi use 4Kb native sectors, and 4Tb disks are all 4Kb. > If these disks don't expose such sector sizing to the OS > properly, you can work around that in several ways, including, > as of recent illumos changes, an override config file for the > SCSI driver. The question was relative to some older boxes running S10 and not planning to upgrade the OS, keeping them alive as long as possible... > The main problem with "avoiding" 4kb drives seems to be just > the cases where you want to replace a single disk in an older > pool built with 512b-native sectored drives. Right, that's what we're concerned with. > For new pools (or rather new complete top-level VDEVs) this does not > matter much, except that your overheads with small data blocks can get > noticeably bigger. Understood. > There were statements on this list that drives emulating 512b > sectors (whether they announce it properly or not) are not > all inherently evil - this emulation by itself may be of some > concern regarding performance, but not one of reliability. > Then again, firmware errors are possible in any part of the > stack, of both older and newer models ;) I haven't seen any post suggesting 512b emulation didn't have very adverse effects on performance. Given how touchy ZFS seems to be I don't want to give him any excuses! Thanks for your post. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] LZ4 compression algorithm
- Forwarded message from Bob Friesenhahn - From: Bob Friesenhahn Date: Mon, 23 Jul 2012 12:55:44 -0500 (CDT) To: z...@lists.illumos.org cc: Radio młodych bandytów , Pawel Jakub Dawidek , develo...@lists.illumos.org Subject: Re: [zfs] LZ4 compression algorithm User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) Reply-To: z...@lists.illumos.org On Mon, 23 Jul 2012, Sašo Kiselkov wrote: > > Anyway, the mere caring for clang by ZFS users doesn't necessarily mean > that clang is unusable. It just may not be usable for kernel > development. The userland story, however, can be very different. FreeBSD 10 is clang-based and still includes ZFS which tracks the Illumos code-base. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ --- illumos-zfs Archives: https://www.listbox.com/member/archive/182191/=now RSS Feed: https://www.listbox.com/member/archive/rss/182191/22842876-6fe17e6f Modify Your Subscription: https://www.listbox.com/member/?member_id=22842876&id_secret=22842876-a25d3366 Powered by Listbox: http://www.listbox.com - End forwarded message - -- Eugen* Leitl http://leitl.org";>leitl http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow speed problem with a new SAS shelf
Hi, Have you had a look iostat -E (error counters) to make sure you don't have faulty cabling? I've bad cables trip me up once in a manner similar to your situation here. Cheers, -- Saso On 07/23/2012 07:18 AM, Yuri Vorobyev wrote: > Hello. > > I faced with a strange performance problem with new disk shelf. > We a using ZFS system with SATA disks for a while. > It is Supermicro SC846-E16 chassis, Supermicro X8DTH-6F motherboard with > 96Gb RAM and 24 HITACHI HDS723020BLA642 SATA disks attached to onboard > LSI 2008 controller. > > Pretty much satisfied with it we bought additional shelf with SAS disks > for VMs hosting. New shelf is Supermicro SC846-E26 chassis. Disks model > is HITACHI HUS156060VLS600 (15K 600Gb SAS2). > Additional controller LSI 9205-8e was installed in server and connected > with JBOD. > I connected JBOD with 2 channels and setup multi path first, but when i > noticed performance problem i disabled multi path and disconnected one > cable (for sure it is not multipath cause the problem). > > Problem description follow: > > Creating test pool with 5 pair of mirrors (new shelf, SAS disks) > > # zpool create -o version=28 -O primarycache=none test mirror > c9t5000CCA02A138899d0 c9t5000CCA02A102181d0 mirror c9t5000CCA02A13500Dd0 > c9t5000CCA02A13316Dd0 mirror c9t5000CCA02A005699d0 c9t5000CCA02A004271d0 > mirror c9t5000CCA02A004229d0 c9t5000CCA02A1342CDd0 mirror > c9t5000CCA02A1251E5d0 c9t5000CCA02A1151DDd0 > > (primarycache=none) to disable ARC influence > > > Testing sequential write > # dd if=/dev/zero of=/test/zero bs=1M count=2048 > 2048+0 records in > 2048+0 records out > 2147483648 bytes (2.1 GB) copied, 1.04272 s, 2.1 GB/s > > iostat when writing look like > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 1334.60.0 165782.9 0.0 8.40.06.3 1 86 > c9t5000CCA02A1151DDd0 > 0.0 1345.50.0 169575.3 0.0 8.70.06.5 1 88 > c9t5000CCA02A1342CDd0 > 2.0 1359.51.0 168969.8 0.0 8.70.06.4 1 90 > c9t5000CCA02A13500Dd0 > 0.0 1358.50.0 168714.0 0.0 8.70.06.4 1 90 > c9t5000CCA02A13316Dd0 > 0.0 1345.50.0 19.3 0.0 9.00.06.7 1 92 > c9t5000CCA02A102181d0 > 1.0 1317.51.0 164456.9 0.0 8.50.06.5 1 88 > c9t5000CCA02A004271d0 > 4.0 1342.52.0 166282.2 0.0 8.50.06.3 1 88 > c9t5000CCA02A1251E5d0 > 0.0 1377.50.0 170515.5 0.0 8.70.06.3 1 90 > c9t5000CCA02A138899d0 > > Now read > # dd if=/test/zero of=/dev/null bs=1M > 2048+0 records in > 2048+0 records out > 2147483648 bytes (2.1 GB) copied, 13.5681 s, 158 MB/s > > iostat when reading > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >106.00.0 11417.40.0 0.0 0.20.02.4 0 14 > c9t5000CCA02A004271d0 > 80.00.0 10239.90.0 0.0 0.20.02.4 0 10 > c9t5000CCA02A1251E5d0 >110.00.0 12182.40.0 0.0 0.10.01.3 0 9 > c9t5000CCA02A138899d0 >102.00.0 11664.40.0 0.0 0.20.01.8 0 15 > c9t5000CCA02A005699d0 > 99.00.0 10900.90.0 0.0 0.30.03.0 0 16 > c9t5000CCA02A004229d0 >107.00.0 11545.40.0 0.0 0.20.01.9 0 13 > c9t5000CCA02A1151DDd0 > 81.00.0 10367.90.0 0.0 0.20.02.2 0 11 > c9t5000CCA02A1342CDd0 > > Unexpected low speed! Note the busy column. When writing it about 90%, > when reading it about 15% > > Individual disks raw read speed (don't be confused with name change. i > connect JBOD to another HBA channel) > > # dd if=/dev/dsk/c8t5000CCA02A13889Ad0 of=/dev/null bs=1M count=2000 > 2000+0 records in > 2000+0 records out > 2097152000 bytes (2.1 GB) copied, 10.9685 s, 191 MB/s > # dd if=/dev/dsk/c8t5000CCA02A1342CEd0 of=/dev/null bs=1M count=2000 > 2000+0 records in > 2000+0 records out > 2097152000 bytes (2.1 GB) copied, 10.8024 s, 194 MB/s > > The 10-disks mirror zpool read slower than a single disk. > > There is no tuning in /etc/system > > I tried test with FreeBSD 8.3 live CD. Reads was the same (about > 150Mb/s). Also i tried SmartOS, but it can't see disks behind LSI > 9205-8e controller. > > For compare this is speed from SATA pool (it consist of 4 6-disk raidz2 > vdev) > #dd if=CentOS-6.2-x86_64-bin-DVD1.iso of=/dev/null bs=1M > 4218+1 records in > 4218+1 records out > 4423129088 bytes (4.4 GB) copied, 4.76552 s, 928 MB/s > > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 13614.40.0 800338.50.0 0.1 36.00.02.6 0 914 c6 >459.90.0 25761.40.0 0.0 0.80.01.8 0 22 > c6t5000CCA369D16860d0 > 84.00.0 2785.20.0 0.0 0.20.03.0 0 13 > c6t5000CCA369D1B1E0d0 >836.90.0 50089.50.0 0.0 2.60.03.1 0 60 > c6t5000CCA369D1B302d0 >411.00.0 24492.60.0 0.0 0.80.02.1 0 25 > c6t5000CCA369D16982d0 >821.90.0 49385.10.0 0.0 3.00.03.7
Re: [zfs-discuss] slow speed problem with a new SAS shelf
23.07.2012 19:39, Richard Elling пишет: I faced with a strange performance problem with new disk shelf. We a using ZFS system with SATA disks for a while. What OS and release? Oh. I forgot this important thing. It is OpenIndiana oi_151a5 now. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question on 4k sectors
2012-07-23 18:37, Anonymous wrote: Really, it would be so helpful to know which drives we can buy with confidence and which should be avoided...is there any way to know from the manufacturers web sites or do you have to actually buy one and see what it does? Thanks to everyone for the info. I think that vendors' marking like 512e may give a clue on their support of "emulated 512-byte sectors", whatever they would mean by that for a specific model line. I believe you can roughly be certain that all 3Tb HDDs except Hitachi use 4Kb native sectors, and 4Tb disks are all 4Kb. If these disks don't expose such sector sizing to the OS properly, you can work around that in several ways, including, as of recent illumos changes, an override config file for the SCSI driver. The main problem with "avoiding" 4kb drives seems to be just the cases where you want to replace a single disk in an older pool built with 512b-native sectored drives. For new pools (or rather new complete top-level VDEVs) this does not matter much, except that your overheads with small data blocks can get noticeably bigger. There were statements on this list that drives emulating 512b sectors (whether they announce it properly or not) are not all inherently evil - this emulation by itself may be of some concern regarding performance, but not one of reliability. Then again, firmware errors are possible in any part of the stack, of both older and newer models ;) HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question on 4k sectors
"Hans J. Albertsson" wrote: > I think the problem is with disks that are 4k organised, but report > their blocksize as 512. > > If the disk reports it's blocksize correctly as 4096, then ZFS should > not have a problem. > At least my 2TB Seagate Barracuda disks seemed to report their > blocksizes as 4096, and my zpools on those machines have ashift set to > 12, which is correct, since 2¹² = 4096 Thanks, this is good to know. Is there any way, looking at manufacturers data sheets for drives, whether they report their blocksize correctly? From Seagate and WD that list the number of sectors, it's trivial to determine what sectors the disk is using. But is this number what the disk is really organized in or is it the number the disk reports?! It is very confusing... So far we seem to rely on reports from people on the list, which is good for us but bad for guys who wasted money on drives that don't work as they should (the drives that don't report actual sector sector size correctly). Really, it would be so helpful to know which drives we can buy with confidence and which should be avoided...is there any way to know from the manufacturers web sites or do you have to actually buy one and see what it does? Thanks to everyone for the info. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow speed problem with a new SAS shelf
On Jul 22, 2012, at 10:18 PM, Yuri Vorobyev wrote: > Hello. > > I faced with a strange performance problem with new disk shelf. > We a using ZFS system with SATA disks for a while. What OS and release? -- richard > It is Supermicro SC846-E16 chassis, Supermicro X8DTH-6F motherboard with 96Gb > RAM and 24 HITACHI HDS723020BLA642 SATA disks attached to onboard LSI 2008 > controller. > > Pretty much satisfied with it we bought additional shelf with SAS disks for > VMs hosting. New shelf is Supermicro SC846-E26 chassis. Disks model is > HITACHI HUS156060VLS600 (15K 600Gb SAS2). > Additional controller LSI 9205-8e was installed in server and connected with > JBOD. > I connected JBOD with 2 channels and setup multi path first, but when i > noticed performance problem i disabled multi path and disconnected one cable > (for sure it is not multipath cause the problem). > > Problem description follow: > > Creating test pool with 5 pair of mirrors (new shelf, SAS disks) > > # zpool create -o version=28 -O primarycache=none test mirror > c9t5000CCA02A138899d0 c9t5000CCA02A102181d0 mirror c9t5000CCA02A13500Dd0 > c9t5000CCA02A13316Dd0 mirror c9t5000CCA02A005699d0 c9t5000CCA02A004271d0 > mirror c9t5000CCA02A004229d0 c9t5000CCA02A1342CDd0 mirror > c9t5000CCA02A1251E5d0 c9t5000CCA02A1151DDd0 > > (primarycache=none) to disable ARC influence > > > Testing sequential write > # dd if=/dev/zero of=/test/zero bs=1M count=2048 > 2048+0 records in > 2048+0 records out > 2147483648 bytes (2.1 GB) copied, 1.04272 s, 2.1 GB/s > > iostat when writing look like > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 1334.60.0 165782.9 0.0 8.40.06.3 1 86 > c9t5000CCA02A1151DDd0 > 0.0 1345.50.0 169575.3 0.0 8.70.06.5 1 88 > c9t5000CCA02A1342CDd0 > 2.0 1359.51.0 168969.8 0.0 8.70.06.4 1 90 > c9t5000CCA02A13500Dd0 > 0.0 1358.50.0 168714.0 0.0 8.70.06.4 1 90 > c9t5000CCA02A13316Dd0 > 0.0 1345.50.0 19.3 0.0 9.00.06.7 1 92 > c9t5000CCA02A102181d0 > 1.0 1317.51.0 164456.9 0.0 8.50.06.5 1 88 > c9t5000CCA02A004271d0 > 4.0 1342.52.0 166282.2 0.0 8.50.06.3 1 88 > c9t5000CCA02A1251E5d0 > 0.0 1377.50.0 170515.5 0.0 8.70.06.3 1 90 > c9t5000CCA02A138899d0 > > Now read > # dd if=/test/zero of=/dev/null bs=1M > 2048+0 records in > 2048+0 records out > 2147483648 bytes (2.1 GB) copied, 13.5681 s, 158 MB/s > > iostat when reading > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 106.00.0 11417.40.0 0.0 0.20.02.4 0 14 > c9t5000CCA02A004271d0 >80.00.0 10239.90.0 0.0 0.20.02.4 0 10 > c9t5000CCA02A1251E5d0 > 110.00.0 12182.40.0 0.0 0.10.01.3 0 9 > c9t5000CCA02A138899d0 > 102.00.0 11664.40.0 0.0 0.20.01.8 0 15 > c9t5000CCA02A005699d0 >99.00.0 10900.90.0 0.0 0.30.03.0 0 16 > c9t5000CCA02A004229d0 > 107.00.0 11545.40.0 0.0 0.20.01.9 0 13 > c9t5000CCA02A1151DDd0 >81.00.0 10367.90.0 0.0 0.20.02.2 0 11 > c9t5000CCA02A1342CDd0 > > Unexpected low speed! Note the busy column. When writing it about 90%, when > reading it about 15% > > Individual disks raw read speed (don't be confused with name change. i > connect JBOD to another HBA channel) > > # dd if=/dev/dsk/c8t5000CCA02A13889Ad0 of=/dev/null bs=1M count=2000 > 2000+0 records in > 2000+0 records out > 2097152000 bytes (2.1 GB) copied, 10.9685 s, 191 MB/s > # dd if=/dev/dsk/c8t5000CCA02A1342CEd0 of=/dev/null bs=1M count=2000 > 2000+0 records in > 2000+0 records out > 2097152000 bytes (2.1 GB) copied, 10.8024 s, 194 MB/s > > The 10-disks mirror zpool read slower than a single disk. > > There is no tuning in /etc/system > > I tried test with FreeBSD 8.3 live CD. Reads was the same (about 150Mb/s). > Also i tried SmartOS, but it can't see disks behind LSI 9205-8e controller. > > For compare this is speed from SATA pool (it consist of 4 6-disk raidz2 vdev) > #dd if=CentOS-6.2-x86_64-bin-DVD1.iso of=/dev/null bs=1M > 4218+1 records in > 4218+1 records out > 4423129088 bytes (4.4 GB) copied, 4.76552 s, 928 MB/s > > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 13614.40.0 800338.50.0 0.1 36.00.02.6 0 914 c6 > 459.90.0 25761.40.0 0.0 0.80.01.8 0 22 > c6t5000CCA369D16860d0 >84.00.0 2785.20.0 0.0 0.20.03.0 0 13 > c6t5000CCA369D1B1E0d0 > 836.90.0 50089.50.0 0.0 2.60.03.1 0 60 > c6t5000CCA369D1B302d0 > 411.00.0 24492.60.0 0.0 0.80.02.1 0 25 > c6t5000CCA369D16982d0 > 821.90.0 49385.10.0 0.0 3.00.03.7 0 67 > c6t5000CCA369CFBDA3d0 > 231.00.0 12292.50.0 0.0 0.50.02.3 0 18 > c6t5000CCA369D17E73d0 > 803.90.0 5009