Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-10 Thread InterNetX - Juergen Gotteswinter
www.bolthole.com/solaris/zrep/

Am 09.03.2016 um 03:08 schrieb Manuel Amador (Rudd-O):
> On 03/09/2016 12:05 AM, Liam Slusser wrote:
>>
>> We use a slightly modified zrep to handle the replication between the two.
> 
> zrep?
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-07 Thread Fred Liu
2016-03-08 4:55 GMT+08:00 Liam Slusser :

> I don't have a 2000 drive array (thats amazing!) but I do have two 280
> drive arrays which are in production.  Here are the generic stats:
>
> server setup:
> OpenIndiana oi_151
> 1 server rack
> Dell r720xd 64g ram with mirrored 250g boot disks
> 5 x LSI 9207-8e dualport SAS pci-e host bus adapters
> Intel 10g fibre ethernet (dual port)
> 2 x SSD for log cache
> 2 x SSD for cache
> 23 x Dell MD1200 with 3T,4T, or 6T NLSAS disks (a mix of Toshiba, Western
> Digital, and Seagate drives - basically whatever Dell sends)
>
> zpool setup:
> 23 x 12-disk raidz2 glued together.  276 total disks.  Basically each new
> 12 disk MD1200 is a new raidz2 added to the pool.
>
> Total size: ~797T
>
> We have an identical server which we replicate changes via zfs snapshots
> every few minutes.  The whole setup as been up and running for a few years
> now, no issues.  As we run low on space we purchase two additional MD1200
> shelfs (one for each system) and add the new raidz2 into pool on-the-fly.
>
> The only real issues we've had is sometimes a disk fails in such a way
> (think Monty Python and the holy grail i'm not dead yet) where the disk
> hasn't failed but is timing out and slows the whole array to a standstill
> until we can manual find and remove the disk.  Other problems are once a
> disk has been replaced sometimes the resilver process can take
> an eternity.  We have also found the snapshot replication process can
> interfere with the resilver process - resilver gets stuck at 99% and never
> ends - so we end up stopping or only doing one replication a day until the
> resilver process is done.
>
> The last helpful hint I have was lowering all the drive timeouts, see
> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
> for info.
>
> [Fred]: zpool wiith 280 drives in production is pretty big! I think 2000
> drives were just in test. It is true that huge pools have lots of operation
> challenges. I have met the similar sluggish issue caused by a
>
   will-die disk.  Just curious, what is the cluster software
implemented in
http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
 ?

Thanks.

Fred

>
>
>
>>>
>>>
>>
> *illumos-zfs* | Archives
> 
>  |
> Modify
> 
> Your Subscription 
>



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-06 Thread Richard Elling

> On Mar 6, 2016, at 9:06 PM, Fred Liu  wrote:
> 
> 
> 
> 2016-03-06 22:49 GMT+08:00 Richard Elling  >:
> 
>> On Mar 3, 2016, at 8:35 PM, Fred Liu > > wrote:
>> 
>> Hi,
>> 
>> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC RAID 
>> introduction,
>> the interesting survey -- the zpool with most disks you have ever built 
>> popped in my brain.
> 
> We test to 2,000 drives. Beyond 2,000 there are some scalability issues that 
> impact failover times.
> We’ve identified these and know what to fix, but need a real customer at this 
> scale to bump it to
> the top of the priority queue.
> 
> [Fred]: Wow! 2000 drives almost need 4~5 whole racks! 
>> 
>> For zfs doesn't support nested vdev, the maximum fault tolerance should be 
>> three(from raidz3).
> 
> Pedantically, it is N, because you can have N-way mirroring.
>  
> [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works in 
> theory and rarely happens in reality.
> 
>> It is stranded if you want to build a very huge pool.
> 
> Scaling redundancy by increasing parity improves data loss protection by 
> about 3 orders of 
> magnitude. Adding capacity by striping reduces data loss protection by 1/N. 
> This is why there is
> not much need to go beyond raidz3. However, if you do want to go there, 
> adding raidz4+ is 
> relatively easy.
> 
> [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of 2000 
> drives. If that is true, the possibility of 4/2000 will be not so low.
>Plus, reslivering takes longer time if single disk has bigger 
> capacity. And further, the cost of over-provisioning spare disks vs raidz4+ 
> will be an deserved 
> trade-off when the storage mesh at the scale of 2000 drives.

Please don't assume, you'll just hurt yourself :-)
For example, do not assume the only option is striping across raidz3 vdevs. 
Clearly, there are many
different options.
 -- richard





---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com