Re: [zfs-discuss] Question on 4k sectors

2012-07-23 Thread Bob Friesenhahn

On Mon, 23 Jul 2012, Anonymous Remailer (austria) wrote:


The question was relative to some older boxes running S10 and not planning
to upgrade the OS, keeping them alive as long as possible...


Recent Solaris 10 kernel patches are addressing drives with 4k 
sectors.  It seems that Solaris 10 will work with drives with 4k 
sectors so Solaris 10 users will not be stuck.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question on 4k sectors

2012-07-23 Thread Anonymous Remailer (austria)

You wrote:

> 2012-07-23 18:37, Anonymous wrote:
> > Really, it would be so helpful to know which drives we can buy with
> > confidence and which should be avoided...is there any way to know from the
> > manufacturers web sites or do you have to actually buy one and see what it
> > does? Thanks to everyone for the info.
> 
> I think that vendors' marking like 512e may give a clue on
> their support of "emulated 512-byte sectors", whatever they
> would mean by that for a specific model line.

Yeah but buying through the mail it's awfully difficult to see the vendor
markings until it's too late ;)

> I believe you can roughly be certain that all 3Tb HDDs except
> Hitachi use 4Kb native sectors, and 4Tb disks are all 4Kb.
> If these disks don't expose such sector sizing to the OS
> properly, you can work around that in several ways, including,
> as of recent illumos changes, an override config file for the
> SCSI driver.

The question was relative to some older boxes running S10 and not planning
to upgrade the OS, keeping them alive as long as possible...

> The main problem with "avoiding" 4kb drives seems to be just
> the cases where you want to replace a single disk in an older
> pool built with 512b-native sectored drives.

Right, that's what we're concerned with.

> For new pools (or rather new complete top-level VDEVs) this does not
> matter much, except that your overheads with small data blocks can get
> noticeably bigger.

Understood.

> There were statements on this list that drives emulating 512b
> sectors (whether they announce it properly or not) are not
> all inherently evil - this emulation by itself may be of some
> concern regarding performance, but not one of reliability.
> Then again, firmware errors are possible in any part of the
> stack, of both older and newer models ;)

I haven't seen any post suggesting 512b emulation didn't have very adverse
effects on performance. Given how touchy ZFS seems to be I don't want to
give him any excuses! Thanks for your post.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] LZ4 compression algorithm

2012-07-23 Thread Eugen Leitl
- Forwarded message from Bob Friesenhahn  
-

From: Bob Friesenhahn 
Date: Mon, 23 Jul 2012 12:55:44 -0500 (CDT)
To: z...@lists.illumos.org
cc: Radio młodych bandytów ,
Pawel Jakub Dawidek , develo...@lists.illumos.org
Subject: Re: [zfs] LZ4 compression algorithm
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
Reply-To: z...@lists.illumos.org

On Mon, 23 Jul 2012, Sašo Kiselkov wrote:
>
> Anyway, the mere caring for clang by ZFS users doesn't necessarily mean
> that clang is unusable. It just may not be usable for kernel
> development. The userland story, however, can be very different.

FreeBSD 10 is clang-based and still includes ZFS which tracks the Illumos 
code-base.

Bob
-- 
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


---
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22842876-6fe17e6f
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=22842876&id_secret=22842876-a25d3366
Powered by Listbox: http://www.listbox.com


- End forwarded message -
-- 
Eugen* Leitl http://leitl.org";>leitl http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow speed problem with a new SAS shelf

2012-07-23 Thread Sašo Kiselkov
Hi,

Have you had a look iostat -E (error counters) to make sure you don't
have faulty cabling? I've bad cables trip me up once in a manner similar
to your situation here.

Cheers,
--
Saso

On 07/23/2012 07:18 AM, Yuri Vorobyev wrote:
> Hello.
> 
> I faced with a strange performance problem with new disk shelf.
> We a using  ZFS system with SATA disks for a while.
> It is Supermicro SC846-E16 chassis, Supermicro X8DTH-6F motherboard with
> 96Gb RAM and 24 HITACHI HDS723020BLA642 SATA disks attached to onboard
> LSI 2008 controller.
> 
> Pretty much satisfied with it we bought additional shelf with SAS disks
> for VMs hosting. New shelf is Supermicro SC846-E26 chassis. Disks model
> is HITACHI HUS156060VLS600 (15K 600Gb SAS2).
> Additional controller LSI 9205-8e was installed in server and connected
> with JBOD.
> I connected JBOD with 2 channels and setup multi path first, but when i
> noticed performance problem i disabled multi path and disconnected one
> cable (for sure it is not multipath cause the problem).
> 
> Problem description follow:
> 
> Creating test pool with 5 pair of mirrors (new shelf, SAS disks)
> 
> # zpool create -o version=28 -O primarycache=none test mirror
> c9t5000CCA02A138899d0 c9t5000CCA02A102181d0 mirror c9t5000CCA02A13500Dd0
> c9t5000CCA02A13316Dd0 mirror c9t5000CCA02A005699d0 c9t5000CCA02A004271d0
> mirror c9t5000CCA02A004229d0 c9t5000CCA02A1342CDd0 mirror
> c9t5000CCA02A1251E5d0 c9t5000CCA02A1151DDd0
> 
> (primarycache=none) to disable ARC influence
> 
> 
> Testing sequential write
> # dd if=/dev/zero of=/test/zero bs=1M count=2048
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB) copied, 1.04272 s, 2.1 GB/s
> 
> iostat when writing look like
>  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  0.0 1334.60.0 165782.9  0.0  8.40.06.3   1  86
> c9t5000CCA02A1151DDd0
>  0.0 1345.50.0 169575.3  0.0  8.70.06.5   1  88
> c9t5000CCA02A1342CDd0
>  2.0 1359.51.0 168969.8  0.0  8.70.06.4   1  90
> c9t5000CCA02A13500Dd0
>  0.0 1358.50.0 168714.0  0.0  8.70.06.4   1  90
> c9t5000CCA02A13316Dd0
>  0.0 1345.50.0 19.3  0.0  9.00.06.7   1  92
> c9t5000CCA02A102181d0
>  1.0 1317.51.0 164456.9  0.0  8.50.06.5   1  88
> c9t5000CCA02A004271d0
>  4.0 1342.52.0 166282.2  0.0  8.50.06.3   1  88
> c9t5000CCA02A1251E5d0
>  0.0 1377.50.0 170515.5  0.0  8.70.06.3   1  90
> c9t5000CCA02A138899d0
> 
> Now read
> # dd if=/test/zero of=/dev/null  bs=1M
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB) copied, 13.5681 s, 158 MB/s
> 
> iostat when reading
>  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>106.00.0 11417.40.0  0.0  0.20.02.4   0  14
> c9t5000CCA02A004271d0
> 80.00.0 10239.90.0  0.0  0.20.02.4   0  10
> c9t5000CCA02A1251E5d0
>110.00.0 12182.40.0  0.0  0.10.01.3   0   9
> c9t5000CCA02A138899d0
>102.00.0 11664.40.0  0.0  0.20.01.8   0  15
> c9t5000CCA02A005699d0
> 99.00.0 10900.90.0  0.0  0.30.03.0   0  16
> c9t5000CCA02A004229d0
>107.00.0 11545.40.0  0.0  0.20.01.9   0  13
> c9t5000CCA02A1151DDd0
> 81.00.0 10367.90.0  0.0  0.20.02.2   0  11
> c9t5000CCA02A1342CDd0
> 
> Unexpected low speed! Note the busy column. When writing it about 90%,
> when reading it about 15%
> 
> Individual disks raw read speed (don't be confused with name change. i
> connect JBOD to another HBA channel)
> 
> # dd if=/dev/dsk/c8t5000CCA02A13889Ad0 of=/dev/null bs=1M count=2000
> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.1 GB) copied, 10.9685 s, 191 MB/s
> # dd if=/dev/dsk/c8t5000CCA02A1342CEd0 of=/dev/null bs=1M count=2000
> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.1 GB) copied, 10.8024 s, 194 MB/s
> 
> The 10-disks mirror zpool read slower than a single disk.
> 
> There is no tuning in /etc/system
> 
> I tried test with FreeBSD 8.3 live CD. Reads was the same (about
> 150Mb/s). Also i tried SmartOS, but it can't see disks behind LSI
> 9205-8e controller.
> 
> For compare this is speed from SATA pool (it consist of 4 6-disk raidz2
> vdev)
> #dd if=CentOS-6.2-x86_64-bin-DVD1.iso of=/dev/null bs=1M
> 4218+1 records in
> 4218+1 records out
> 4423129088 bytes (4.4 GB) copied, 4.76552 s, 928 MB/s
> 
>  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>   13614.40.0 800338.50.0  0.1 36.00.02.6   0 914 c6
>459.90.0 25761.40.0  0.0  0.80.01.8   0  22
> c6t5000CCA369D16860d0
> 84.00.0 2785.20.0  0.0  0.20.03.0   0  13
> c6t5000CCA369D1B1E0d0
>836.90.0 50089.50.0  0.0  2.60.03.1   0  60
> c6t5000CCA369D1B302d0
>411.00.0 24492.60.0  0.0  0.80.02.1   0  25
> c6t5000CCA369D16982d0
>821.90.0 49385.10.0  0.0  3.00.03.7 

Re: [zfs-discuss] slow speed problem with a new SAS shelf

2012-07-23 Thread Yuri Vorobyev

23.07.2012 19:39, Richard Elling пишет:


I faced with a strange performance problem with new disk shelf.
We a using  ZFS system with SATA disks for a while.


What OS and release?

Oh. I forgot this important thing.
It is OpenIndiana oi_151a5 now.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question on 4k sectors

2012-07-23 Thread Jim Klimov

2012-07-23 18:37, Anonymous wrote:

Really, it would be so helpful to know which drives we can buy with
confidence and which should be avoided...is there any way to know from the
manufacturers web sites or do you have to actually buy one and see what it
does? Thanks to everyone for the info.


I think that vendors' marking like 512e may give a clue on
their support of "emulated 512-byte sectors", whatever they
would mean by that for a specific model line.

I believe you can roughly be certain that all 3Tb HDDs except
Hitachi use 4Kb native sectors, and 4Tb disks are all 4Kb.
If these disks don't expose such sector sizing to the OS
properly, you can work around that in several ways, including,
as of recent illumos changes, an override config file for the
SCSI driver.

The main problem with "avoiding" 4kb drives seems to be just
the cases where you want to replace a single disk in an older
pool built with 512b-native sectored drives. For new pools
(or rather new complete top-level VDEVs) this does not matter
much, except that your overheads with small data blocks can
get noticeably bigger.

There were statements on this list that drives emulating 512b
sectors (whether they announce it properly or not) are not
all inherently evil - this emulation by itself may be of some
concern regarding performance, but not one of reliability.
Then again, firmware errors are possible in any part of the
stack, of both older and newer models ;)

HTH,
//Jim


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question on 4k sectors

2012-07-23 Thread Anonymous
"Hans J. Albertsson"  wrote:

> I think the problem is with disks that are 4k organised, but report 
> their blocksize as 512.
> 
> If the disk reports it's blocksize correctly as 4096, then ZFS should 
> not have a problem.
> At least my 2TB Seagate Barracuda disks seemed to report their 
> blocksizes as 4096, and my zpools on those machines have ashift set to 
> 12, which is correct, since 2¹² = 4096

Thanks, this is good to know. Is there any way, looking at manufacturers
data sheets for drives, whether they report their blocksize correctly? From
Seagate and WD that list the number of sectors, it's trivial to determine
what sectors the disk is using. But is this number what the disk is really
organized in or is it the number the disk reports?! It is very confusing...

So far we seem to rely on reports from people on the list, which is good for
us but bad for guys who wasted money on drives that don't work as they
should (the drives that don't report actual sector sector size correctly).

Really, it would be so helpful to know which drives we can buy with
confidence and which should be avoided...is there any way to know from the
manufacturers web sites or do you have to actually buy one and see what it
does? Thanks to everyone for the info.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow speed problem with a new SAS shelf

2012-07-23 Thread Richard Elling
On Jul 22, 2012, at 10:18 PM, Yuri Vorobyev wrote:

> Hello.
> 
> I faced with a strange performance problem with new disk shelf.
> We a using  ZFS system with SATA disks for a while.

What OS and release?
 -- richard

> It is Supermicro SC846-E16 chassis, Supermicro X8DTH-6F motherboard with 96Gb 
> RAM and 24 HITACHI HDS723020BLA642 SATA disks attached to onboard LSI 2008 
> controller.
> 
> Pretty much satisfied with it we bought additional shelf with SAS disks for 
> VMs hosting. New shelf is Supermicro SC846-E26 chassis. Disks model is 
> HITACHI HUS156060VLS600 (15K 600Gb SAS2).
> Additional controller LSI 9205-8e was installed in server and connected with 
> JBOD.
> I connected JBOD with 2 channels and setup multi path first, but when i 
> noticed performance problem i disabled multi path and disconnected one cable 
> (for sure it is not multipath cause the problem).
> 
> Problem description follow:
> 
> Creating test pool with 5 pair of mirrors (new shelf, SAS disks)
> 
> # zpool create -o version=28 -O primarycache=none test mirror 
> c9t5000CCA02A138899d0 c9t5000CCA02A102181d0 mirror c9t5000CCA02A13500Dd0 
> c9t5000CCA02A13316Dd0 mirror c9t5000CCA02A005699d0 c9t5000CCA02A004271d0 
> mirror c9t5000CCA02A004229d0 c9t5000CCA02A1342CDd0 mirror 
> c9t5000CCA02A1251E5d0 c9t5000CCA02A1151DDd0
> 
> (primarycache=none) to disable ARC influence
> 
> 
> Testing sequential write
> # dd if=/dev/zero of=/test/zero bs=1M count=2048
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB) copied, 1.04272 s, 2.1 GB/s
> 
> iostat when writing look like
> r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
> 0.0 1334.60.0 165782.9  0.0  8.40.06.3   1  86 
> c9t5000CCA02A1151DDd0
> 0.0 1345.50.0 169575.3  0.0  8.70.06.5   1  88 
> c9t5000CCA02A1342CDd0
> 2.0 1359.51.0 168969.8  0.0  8.70.06.4   1  90 
> c9t5000CCA02A13500Dd0
> 0.0 1358.50.0 168714.0  0.0  8.70.06.4   1  90 
> c9t5000CCA02A13316Dd0
> 0.0 1345.50.0 19.3  0.0  9.00.06.7   1  92 
> c9t5000CCA02A102181d0
> 1.0 1317.51.0 164456.9  0.0  8.50.06.5   1  88 
> c9t5000CCA02A004271d0
> 4.0 1342.52.0 166282.2  0.0  8.50.06.3   1  88 
> c9t5000CCA02A1251E5d0
> 0.0 1377.50.0 170515.5  0.0  8.70.06.3   1  90 
> c9t5000CCA02A138899d0
> 
> Now read
> # dd if=/test/zero of=/dev/null  bs=1M
> 2048+0 records in
> 2048+0 records out
> 2147483648 bytes (2.1 GB) copied, 13.5681 s, 158 MB/s
> 
> iostat when reading
> r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>   106.00.0 11417.40.0  0.0  0.20.02.4   0  14 
> c9t5000CCA02A004271d0
>80.00.0 10239.90.0  0.0  0.20.02.4   0  10 
> c9t5000CCA02A1251E5d0
>   110.00.0 12182.40.0  0.0  0.10.01.3   0   9 
> c9t5000CCA02A138899d0
>   102.00.0 11664.40.0  0.0  0.20.01.8   0  15 
> c9t5000CCA02A005699d0
>99.00.0 10900.90.0  0.0  0.30.03.0   0  16 
> c9t5000CCA02A004229d0
>   107.00.0 11545.40.0  0.0  0.20.01.9   0  13 
> c9t5000CCA02A1151DDd0
>81.00.0 10367.90.0  0.0  0.20.02.2   0  11 
> c9t5000CCA02A1342CDd0
> 
> Unexpected low speed! Note the busy column. When writing it about 90%, when 
> reading it about 15%
> 
> Individual disks raw read speed (don't be confused with name change. i 
> connect JBOD to another HBA channel)
> 
> # dd if=/dev/dsk/c8t5000CCA02A13889Ad0 of=/dev/null bs=1M count=2000
> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.1 GB) copied, 10.9685 s, 191 MB/s
> # dd if=/dev/dsk/c8t5000CCA02A1342CEd0 of=/dev/null bs=1M count=2000
> 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.1 GB) copied, 10.8024 s, 194 MB/s
> 
> The 10-disks mirror zpool read slower than a single disk.
> 
> There is no tuning in /etc/system
> 
> I tried test with FreeBSD 8.3 live CD. Reads was the same (about 150Mb/s). 
> Also i tried SmartOS, but it can't see disks behind LSI 9205-8e controller.
> 
> For compare this is speed from SATA pool (it consist of 4 6-disk raidz2 vdev)
> #dd if=CentOS-6.2-x86_64-bin-DVD1.iso of=/dev/null bs=1M
> 4218+1 records in
> 4218+1 records out
> 4423129088 bytes (4.4 GB) copied, 4.76552 s, 928 MB/s
> 
> r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  13614.40.0 800338.50.0  0.1 36.00.02.6   0 914 c6
>   459.90.0 25761.40.0  0.0  0.80.01.8   0  22 
> c6t5000CCA369D16860d0
>84.00.0 2785.20.0  0.0  0.20.03.0   0  13 
> c6t5000CCA369D1B1E0d0
>   836.90.0 50089.50.0  0.0  2.60.03.1   0  60 
> c6t5000CCA369D1B302d0
>   411.00.0 24492.60.0  0.0  0.80.02.1   0  25 
> c6t5000CCA369D16982d0
>   821.90.0 49385.10.0  0.0  3.00.03.7   0  67 
> c6t5000CCA369CFBDA3d0
>   231.00.0 12292.50.0  0.0  0.50.02.3   0  18 
> c6t5000CCA369D17E73d0
>   803.90.0 5009