Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Willem Jan Withagen

On 19/07/2018 13:28, Satish Patel wrote:

Thanks for massive details, so what are the options I have can I disable raid 
controller and run system without raid and use software raid for OS?


Not sure what kind of RAID controller you have. I seem to recall and HP 
thingy? And those I don't trust at all in HBA mode, I've heard to much 
bad things about them: They keep messing with the communication to the disk.

Also not sure you can get a firmware version that does HBA only.


Does that make sense ?


Well I run ZFS on FreeBSD, and usually run a zfs mirror for my OS disks.
I guess that for the OS partition it not really matters what you do.
Even RAID it on the controller is not that important. Linux will be able 
to manage that. And your OS disk are not going to be > 4-6T. So relative 
oke recover times, and no serious performance requirements.


So you could do either.

Normally we would for a ZFS/Ceph system:
 - have 2 small disks mirrored for OS. Now a days you can get 64-128G
sata DOM for this. Saves 2 trays in the front.
Or get a cabinet with 2 2,5"s in the back.
   Connected to the motherboard
 - For a 24 Tray cabinet:
disktray backplane with individual lanes to each disk
(have to specifically ask SM for that)
motherboard with at least 3* 8 PCIe lanes, 2* 10G onboard.
(would prefer 3*16, but those are relatively rare
  or they are not fully wired. And require CPUs
with 48 lanes )
3 LSI HBA 9207-8i to connect the trays.

--WjW


Sent from my iPhone


On Jul 19, 2018, at 6:33 AM, Willem Jan Withagen  wrote:


On 19/07/2018 10:53, Simon Ironside wrote:

On 19/07/18 07:59, Dietmar Rieder wrote:
We have P840ar controllers with battery backed cache in our OSD nodes
and configured an individual RAID-0 for each OSD (ceph luminous +
bluestore). We have not seen any problems with this setup so far and
performance is great at least for our workload.

I'm doing the same with LSI RAID controllers for the same reason, to take 
advantage of the battery backed cache. No problems with this here either. As 
Troy said, you do need to go through the additional step of creating a single 
disk RAID0 whenever you replace a disk that you wouldn't with regular HBA.


This discussion has been running on ZFS lists for quite some time and extend...
Since ZFS really does depend on that the software wants direct access to the 
disk without extra abstraction layers.
And as both with ZFS and Ceph RAID is dead. these newly designed storage 
systems solve problems that RAID cannot anymore.
(Read about why new RAID versions will not really save you from crashed disk 
due to a MTBF time that equals recovery time on new large disks.)

Basic fact remains that RAID controllers sort of lie to the users, and even 
more the advanced ones with backup batteries. If everything is all well in 
paradise you will usually get away with it. But if not, that expensive piece of 
hardware will turn everything in to cr..p.

For example lots of LSI firmware has had bugs in them, especially the 
Enterprise version can do really wierd things. That is why we install the IT 
version of the firmware, as to cripple the RAID functionality as much as one 
can. It turns your expensive RAID controller basically into just a plain HBA. 
(no more configs for extra disks.)

So unless you HAVE to take it, because you can not rule it out in the system 
configurator whilest buying. Go for the simple controllers that can act as HBA.

There are a few more things to consider, like
- what is the bandwidth on the disk carrier backplane?
What kind of port multipliers are used, and is the design as
it should be. I've seen board with 2 multipliers where it turns
out that only one is used, and the other only can be used for
multipath... So is going to be a bottleneck on the feed
to the multiplier?
- how many lanes from your expensive HBA with multi lane SAS/SATA are
actually used?
I have seen 24 tray backplanes that want to run over 2 or 4 SAS
lanes. Even when you think you are using all 8 lanes from the
HBA because you have 2 SFF-8087 cables.
It is not for a reason that SuperMicro also has a disktray
backplane with 24 individual wired out SAS/SATA ports.
Just ordering the basic cabinet will probably get you the wrong
stuff.
  - And once you sort have fixed the bottlenecks here, can you actually
run all disks at full speed over the controller to the PCI
bus(ses).
   Even a 16 lane PCIe slot will at very theoretical best do 16Gbit/s.
   Now connect a bunch of 12Gb/s SSD disks to this connector and see
   the bottleneck arise. Even with more than 20 HDD it is going to be
   crowed on this controller.

Normally I'd say: Lies, damned lies, and statistics.
But in this case: Lies, damned lies and hardware. 8-D

--WjW
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Satish Patel
Thanks for massive details, so what are the options I have can I disable raid 
controller and run system without raid and use software raid for OS?

Does that make sense ?

Sent from my iPhone

> On Jul 19, 2018, at 6:33 AM, Willem Jan Withagen  wrote:
> 
>> On 19/07/2018 10:53, Simon Ironside wrote:
>>> On 19/07/18 07:59, Dietmar Rieder wrote:
>>> We have P840ar controllers with battery backed cache in our OSD nodes
>>> and configured an individual RAID-0 for each OSD (ceph luminous +
>>> bluestore). We have not seen any problems with this setup so far and
>>> performance is great at least for our workload.
>> I'm doing the same with LSI RAID controllers for the same reason, to take 
>> advantage of the battery backed cache. No problems with this here either. As 
>> Troy said, you do need to go through the additional step of creating a 
>> single disk RAID0 whenever you replace a disk that you wouldn't with regular 
>> HBA.
> 
> This discussion has been running on ZFS lists for quite some time and 
> extend...
> Since ZFS really does depend on that the software wants direct access to the 
> disk without extra abstraction layers.
> And as both with ZFS and Ceph RAID is dead. these newly designed storage 
> systems solve problems that RAID cannot anymore.
> (Read about why new RAID versions will not really save you from crashed disk 
> due to a MTBF time that equals recovery time on new large disks.)
> 
> Basic fact remains that RAID controllers sort of lie to the users, and even 
> more the advanced ones with backup batteries. If everything is all well in 
> paradise you will usually get away with it. But if not, that expensive piece 
> of hardware will turn everything in to cr..p.
> 
> For example lots of LSI firmware has had bugs in them, especially the 
> Enterprise version can do really wierd things. That is why we install the IT 
> version of the firmware, as to cripple the RAID functionality as much as one 
> can. It turns your expensive RAID controller basically into just a plain HBA. 
> (no more configs for extra disks.)
> 
> So unless you HAVE to take it, because you can not rule it out in the system 
> configurator whilest buying. Go for the simple controllers that can act as 
> HBA.
> 
> There are a few more things to consider, like
> - what is the bandwidth on the disk carrier backplane?
>What kind of port multipliers are used, and is the design as
>it should be. I've seen board with 2 multipliers where it turns
>out that only one is used, and the other only can be used for
>multipath... So is going to be a bottleneck on the feed
>to the multiplier?
> - how many lanes from your expensive HBA with multi lane SAS/SATA are
>actually used?
>I have seen 24 tray backplanes that want to run over 2 or 4 SAS
>lanes. Even when you think you are using all 8 lanes from the
>HBA because you have 2 SFF-8087 cables.
>It is not for a reason that SuperMicro also has a disktray
>backplane with 24 individual wired out SAS/SATA ports.
>Just ordering the basic cabinet will probably get you the wrong
>stuff.
>  - And once you sort have fixed the bottlenecks here, can you actually
>run all disks at full speed over the controller to the PCI
>bus(ses).
>   Even a 16 lane PCIe slot will at very theoretical best do 16Gbit/s.
>   Now connect a bunch of 12Gb/s SSD disks to this connector and see
>   the bottleneck arise. Even with more than 20 HDD it is going to be
>   crowed on this controller.
> 
> Normally I'd say: Lies, damned lies, and statistics.
> But in this case: Lies, damned lies and hardware. 8-D
> 
> --WjW
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Willem Jan Withagen

On 19/07/2018 10:53, Simon Ironside wrote:

On 19/07/18 07:59, Dietmar Rieder wrote:


We have P840ar controllers with battery backed cache in our OSD nodes
and configured an individual RAID-0 for each OSD (ceph luminous +
bluestore). We have not seen any problems with this setup so far and
performance is great at least for our workload.


I'm doing the same with LSI RAID controllers for the same reason, to 
take advantage of the battery backed cache. No problems with this here 
either. As Troy said, you do need to go through the additional step of 
creating a single disk RAID0 whenever you replace a disk that you 
wouldn't with regular HBA.


This discussion has been running on ZFS lists for quite some time and 
extend...
Since ZFS really does depend on that the software wants direct access to 
the disk without extra abstraction layers.
And as both with ZFS and Ceph RAID is dead. these newly designed 
storage systems solve problems that RAID cannot anymore.
(Read about why new RAID versions will not really save you from crashed 
disk due to a MTBF time that equals recovery time on new large disks.)


Basic fact remains that RAID controllers sort of lie to the users, and 
even more the advanced ones with backup batteries. If everything is all 
well in paradise you will usually get away with it. But if not, that 
expensive piece of hardware will turn everything in to cr..p.


For example lots of LSI firmware has had bugs in them, especially the 
Enterprise version can do really wierd things. That is why we install 
the IT version of the firmware, as to cripple the RAID functionality as 
much as one can. It turns your expensive RAID controller basically into 
just a plain HBA. (no more configs for extra disks.)


So unless you HAVE to take it, because you can not rule it out in the 
system configurator whilest buying. Go for the simple controllers that 
can act as HBA.


There are a few more things to consider, like
 - what is the bandwidth on the disk carrier backplane?
What kind of port multipliers are used, and is the design as
it should be. I've seen board with 2 multipliers where it turns
out that only one is used, and the other only can be used for
multipath... So is going to be a bottleneck on the feed
to the multiplier?
 - how many lanes from your expensive HBA with multi lane SAS/SATA are
actually used?
I have seen 24 tray backplanes that want to run over 2 or 4 SAS
lanes. Even when you think you are using all 8 lanes from the
HBA because you have 2 SFF-8087 cables.
It is not for a reason that SuperMicro also has a disktray
backplane with 24 individual wired out SAS/SATA ports.
Just ordering the basic cabinet will probably get you the wrong
stuff.
  - And once you sort have fixed the bottlenecks here, can you actually
run all disks at full speed over the controller to the PCI
bus(ses).
   Even a 16 lane PCIe slot will at very theoretical best do 16Gbit/s.
   Now connect a bunch of 12Gb/s SSD disks to this connector and see
   the bottleneck arise. Even with more than 20 HDD it is going to be
   crowed on this controller.

Normally I'd say: Lies, damned lies, and statistics.
But in this case: Lies, damned lies and hardware. 8-D

--WjW
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Simon Ironside

On 19/07/18 07:59, Dietmar Rieder wrote:


We have P840ar controllers with battery backed cache in our OSD nodes
and configured an individual RAID-0 for each OSD (ceph luminous +
bluestore). We have not seen any problems with this setup so far and
performance is great at least for our workload.


I'm doing the same with LSI RAID controllers for the same reason, to 
take advantage of the battery backed cache. No problems with this here 
either. As Troy said, you do need to go through the additional step of 
creating a single disk RAID0 whenever you replace a disk that you 
wouldn't with regular HBA.


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Marco Gaiarin
Mandi! Troy Ablan
  In chel di` si favelave...

> Even worse, the P410i doesn't appear to support a pass-thru (JBOD/HBA)
> mode, so your only sane option for using this card is to create RAID-0s.

I confirm Even worse, P410i can define a maximum of 2 'array' (even a
fake array composed of one disk in raid-0) without the (expensive!)
cache module.

I've found digging around references to alternative firmwares that
enable JBOD/HBA, but never walked that way...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Dietmar Rieder
On 07/19/2018 04:44 AM, Satish Patel wrote:
> If i have 8 OSD drives in server on P410i RAID controller (HP), If i
> want to make this server has OSD node in that case show should i
> configure RAID?
> 
> 1. Put all drives in RAID-0?
> 2. Put individual HDD in RAID-0 and create 8 individual RAID-0 so OS
> can see 8 separate HDD drives
> 
> What most people doing in production for Ceph (BleuStore)?


We have P840ar controllers with battery backed cache in our OSD nodes
and configured an individual RAID-0 for each OSD (ceph luminous +
bluestore). We have not seen any problems with this setup so far and
performance is great at least for our workload.

Dietmar



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAID question for Ceph

2018-07-18 Thread Troy Ablan



On 07/18/2018 07:44 PM, Satish Patel wrote:
> If i have 8 OSD drives in server on P410i RAID controller (HP), If i
> want to make this server has OSD node in that case show should i
> configure RAID?
> 
> 1. Put all drives in RAID-0?
> 2. Put individual HDD in RAID-0 and create 8 individual RAID-0 so OS
> can see 8 separate HDD drives
> 
> What most people doing in production for Ceph (BleuStore)?

In my experience, using a RAID card is not ideal for storage systems
like Ceph.  Redundancy comes from replicating data across multiple
hosts, so there's no need for this functionality in a disk controller.
Even worse, the P410i doesn't appear to support a pass-thru (JBOD/HBA)
mode, so your only sane option for using this card is to create RAID-0s.
Whenever you need to replace a bad drive, you will need to go through
the extra step of creating a RAID-0 on the new drive.

In a production environment, I would recommend an HBA that exposes all
of the drives directly to the OS. It makes management and monitoring a
lot easier.

-Troy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com