Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-10-31 Thread Elias Abacioglu
Hi Felix,

I have experience from running Ceph on SATADOM on R630. And it is kind of
bad cause we got bad SATADOM's from Dell.
If you are going to use SATADOM make sure to buy directly from a Innodisk
reseller and not from Dell.
We bought our SATADOM from Dell and they degraded in 5-6 months. And the
reason is that Dell is to cheap to get decent SATADOM, Innodisk got SATADOM
with and without TRIM. Dell resells the SATADOM's without TRIM and use them
for their Nutanix XC Series.

And here is the quirk. You won't find a regular 4-pin Molex power inside
the Dell R series, they have a small 4-pin power on the mobo next to the
internal SATA slots, but it's not a regular 4p 12v ATX, it is smaller.

So unless you can get Dell to sell custom power cables or better SATADOM,
you need to buy the SATADOM from Dell which includes a small cable that
fits and then you are screwed in a couple of months cause their SATADOM
doesn't do TRIM.

I've also tried using R630 internal USB port with Corsair Voyager GTX USB
flash drive which supports TRIM, but unfortunately Linux (we are running
v4.4.0) does not send TRIM over USB. In MS Windows TRIM works with that USB
drive, I tested that with my laptop using virtualbox.
So these USB drives will degrade as well.

Whenever you are trying to do something smart, there is always a quirk it
seems.

/Elias

On Tue, Aug 16, 2016 at 10:43 AM, Félix Barbeira 
wrote:

> Thanks everybody for the answers, it really helped me a lot. So, to sum
> up, this is the options that I have:
>
>
>- OS in a RAID1.
>   - PROS: the cluster is protected against OS failures. If one of
>   this disks fail, it could be easily replaced because it is 
> hot-swappable.
>   - CONS: we are "wasting" 2 bays of disks that could be destinated
>   to OSDs.
>
> * In the case of R730xd we have the option to put 2x2.5" SSDs disks on
> the slots on the back like Brian says. For me this is clearly the best
> option. We'll see if the department of finance has the same opinion :)
>
>
>- OS in a single disk.
>- PROS: we are using only 1 disk slot. It could be a cheaper disk than
>   the 4TB model because we are only going to use ~10GB.
>   - CONS: the OS is not protected against failures and if this disk
>   fails, the OSDs in this machine (11) fails too. In this case we might 
> try
>   to adjust the configuration in order to not reconstruct all this OSDs 
> data
>   and wait until the OS disk is replaced (I'm not sure if this is 
> possible, I
>   should check the docs).
>- OS in a SATADOM ( http://www.innodisk.com/intel/product.html )
>   - PROS: we have all the disk slots available to use for OSDs.
>   - CONS: I have no experience with this kind of devices, I'm not
>   sure if the are trustworthy. This devices are fast but they are not raid
>   protected, it's a single point of failure like the previous option.
>- OS boot from a SAN (this is the option I'm considering for the non
>R730xd machines, which does not have the 2x2.5" slots on the back).
>   - PROS: all the disk slots are available to OSDs. The OS disk is
>   protected by RAID on the remote storage.
>   - CONS: we depend of the network, I guess the OS device does not
>   require a lot of traffic, all the ceph OSDs network traffic should be
>   managed through another network card.
>
> Maybe I'm missing some other option, in that case please tell me, it would
> be helpful.
>
> It would be really helpful if somebody has experience with the option of
> booting OS from a SAN, sharing their pros/cons experience because that
> option it's very interesting to me.
>
>
> 2016-08-14 14:57 GMT+02:00 Christian Balzer :
>
>>
>> Hello,
>>
>> I shall top-quote, summarize here.
>>
>> Firstly we have to consider that Ceph is deployed by people with a wide
>> variety of needs, budgets and most of all cluster sizes.
>>
>> Wido has the pleasure (or is that nightmare? ^o^) to deal with a really
>> huge cluster, thousands of OSDs and an according larg number of nodes (if
>> memory serves me).
>>
>> While many others have comparatively small clusters, with decisively less
>> than 10 storage nodes, like me.
>>
>> So the approach and philosophy is obviously going to differ quite a bit
>> on either end of this spectrum.
>>
>> If you start large (dozens of nodes and hundreds of OSDs), where only a
>> small fraction of your data (10% or less) is in a failure domain (host
>> initially), then you can play fast and loose and save a lot of money by
>> designing your machines and infrastructure accordingly.
>> Things like redundant OS drives, PSUs, even network links on the host if
>> the cluster big enough.
>> In a cluster of sufficient size, a node failure and the resulting data
>> movements is just background noise.
>>
>> OTOH with smaller clusters, you obviously want to avoid failures if at all
>> possible, since not only the re-balancing is going to be more 

Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-16 Thread Félix Barbeira
Thanks everybody for the answers, it really helped me a lot. So, to sum up,
this is the options that I have:


   - OS in a RAID1.
  - PROS: the cluster is protected against OS failures. If one of this
  disks fail, it could be easily replaced because it is hot-swappable.
  - CONS: we are "wasting" 2 bays of disks that could be destinated to
  OSDs.

* In the case of R730xd we have the option to put 2x2.5" SSDs disks on the
slots on the back like Brian says. For me this is clearly the best option.
We'll see if the department of finance has the same opinion :)


   - OS in a single disk.
   - PROS: we are using only 1 disk slot. It could be a cheaper disk than
  the 4TB model because we are only going to use ~10GB.
  - CONS: the OS is not protected against failures and if this disk
  fails, the OSDs in this machine (11) fails too. In this case we might try
  to adjust the configuration in order to not reconstruct all this
OSDs data
  and wait until the OS disk is replaced (I'm not sure if this is
possible, I
  should check the docs).
   - OS in a SATADOM ( http://www.innodisk.com/intel/product.html )
  - PROS: we have all the disk slots available to use for OSDs.
  - CONS: I have no experience with this kind of devices, I'm not sure
  if the are trustworthy. This devices are fast but they are not raid
  protected, it's a single point of failure like the previous option.
   - OS boot from a SAN (this is the option I'm considering for the non
   R730xd machines, which does not have the 2x2.5" slots on the back).
  - PROS: all the disk slots are available to OSDs. The OS disk is
  protected by RAID on the remote storage.
  - CONS: we depend of the network, I guess the OS device does not
  require a lot of traffic, all the ceph OSDs network traffic should be
  managed through another network card.

Maybe I'm missing some other option, in that case please tell me, it would
be helpful.

It would be really helpful if somebody has experience with the option of
booting OS from a SAN, sharing their pros/cons experience because that
option it's very interesting to me.


2016-08-14 14:57 GMT+02:00 Christian Balzer :

>
> Hello,
>
> I shall top-quote, summarize here.
>
> Firstly we have to consider that Ceph is deployed by people with a wide
> variety of needs, budgets and most of all cluster sizes.
>
> Wido has the pleasure (or is that nightmare? ^o^) to deal with a really
> huge cluster, thousands of OSDs and an according larg number of nodes (if
> memory serves me).
>
> While many others have comparatively small clusters, with decisively less
> than 10 storage nodes, like me.
>
> So the approach and philosophy is obviously going to differ quite a bit
> on either end of this spectrum.
>
> If you start large (dozens of nodes and hundreds of OSDs), where only a
> small fraction of your data (10% or less) is in a failure domain (host
> initially), then you can play fast and loose and save a lot of money by
> designing your machines and infrastructure accordingly.
> Things like redundant OS drives, PSUs, even network links on the host if
> the cluster big enough.
> In a cluster of sufficient size, a node failure and the resulting data
> movements is just background noise.
>
> OTOH with smaller clusters, you obviously want to avoid failures if at all
> possible, since not only the re-balancing is going to be more painful, but
> the resulting smaller cluster will also have less performance.
> This is why my OSD nodes have all the redundancy bells and whistles there
> are, simply because a cluster big enough to not need them would be both
> vastly more expensive despite cheaper individual node costs and also
> underutilized.
>
> Of course if you should grow to a certain point, maybe your next
> generation of OSD nodes can be build on the cheap w/o compromising safe
> operations.
>
> No matter what size your cluster is though, setting
> "mon_osd_down_out_subtree_limit" to an appropriate value (host for small
> clusters) is a good way to avoid re-balancing storms when a node (or some
> larger segment) goes down, given that recovering the failed part can be
> significantly faster than moving tons of data around.
> This of course implies 24/7 monitoring and access to the HW.
>
>
> As for dedicated MONs, I usually try to have the primary MON (lowest IP)
> on dedicated HW and to be sure that MONs residing on OSD nodes have fast
> storage and enough CPU/RAM to be happy even if the OSDs go on full spin.
>
> Which incidentally is why your shared MONs are likely a better fit for a
> HDD based OSD node than a SSD based one used for a cache pool for example.
>
> Anyway, MONs are clearly candidates for having their OS (where /var/lib
> resides) on RAIDed, hot-swappable fast and durable and power-loss safe
> SSDs, just so you can avoid loosing one and having to shut down the whole
> thing in the (unlikely) case of a SSD failure.
>
>
> Regards,
>
> 

Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-14 Thread Christian Balzer

Hello,

I shall top-quote, summarize here.

Firstly we have to consider that Ceph is deployed by people with a wide
variety of needs, budgets and most of all cluster sizes.

Wido has the pleasure (or is that nightmare? ^o^) to deal with a really
huge cluster, thousands of OSDs and an according larg number of nodes (if
memory serves me). 

While many others have comparatively small clusters, with decisively less
than 10 storage nodes, like me.

So the approach and philosophy is obviously going to differ quite a bit
on either end of this spectrum.

If you start large (dozens of nodes and hundreds of OSDs), where only a
small fraction of your data (10% or less) is in a failure domain (host
initially), then you can play fast and loose and save a lot of money by
designing your machines and infrastructure accordingly.
Things like redundant OS drives, PSUs, even network links on the host if
the cluster big enough.  
In a cluster of sufficient size, a node failure and the resulting data
movements is just background noise.

OTOH with smaller clusters, you obviously want to avoid failures if at all
possible, since not only the re-balancing is going to be more painful, but
the resulting smaller cluster will also have less performance.
This is why my OSD nodes have all the redundancy bells and whistles there
are, simply because a cluster big enough to not need them would be both
vastly more expensive despite cheaper individual node costs and also
underutilized.

Of course if you should grow to a certain point, maybe your next
generation of OSD nodes can be build on the cheap w/o compromising safe
operations.

No matter what size your cluster is though, setting
"mon_osd_down_out_subtree_limit" to an appropriate value (host for small
clusters) is a good way to avoid re-balancing storms when a node (or some
larger segment) goes down, given that recovering the failed part can be
significantly faster than moving tons of data around. 
This of course implies 24/7 monitoring and access to the HW.


As for dedicated MONs, I usually try to have the primary MON (lowest IP)
on dedicated HW and to be sure that MONs residing on OSD nodes have fast
storage and enough CPU/RAM to be happy even if the OSDs go on full spin.

Which incidentally is why your shared MONs are likely a better fit for a
HDD based OSD node than a SSD based one used for a cache pool for example.

Anyway, MONs are clearly candidates for having their OS (where /var/lib
resides) on RAIDed, hot-swappable fast and durable and power-loss safe
SSDs, just so you can avoid loosing one and having to shut down the whole
thing in the (unlikely) case of a SSD failure.


Regards,

Christian

On Sat, 13 Aug 2016 09:43:26 +0200 w...@42on.com wrote:

> 
> 
> > Op 13 aug. 2016 om 08:58 heeft Georgios Dimitrakakis  
> > het volgende geschreven:
> > 
> > 
> >>> Op 13 aug. 2016 om 03:19 heeft Bill Sharer  het volgende geschreven:
> >>> 
> >>> If all the system disk does is handle the o/s (ie osd journals are
> >>> on dedicated or osd drives as well), no problem. Just rebuild the
> >>> system and copy the ceph.conf back in when you re-install ceph.Â
> >>> Keep a spare copy of your original fstab to keep your osd filesystem
> >>> mounts straight.
> >> 
> >> With systems deployed with ceph-disk/ceph-deploy you no longer need a
> >> fstab. Udev handles it.
> >> 
> >>> Just keep in mind that you are down 11 osds while that system drive
> >>> gets rebuilt though. It's safer to do 10 osds and then have a
> >>> mirror set for the system disk.
> >> 
> >> In the years that I run Ceph I rarely see OS disks fail. Why bother?
> >> Ceph is designed for failure.
> >> 
> >> I would not sacrifice a OSD slot for a OS disk. Also, let's say a
> >> additional OS disk is €100.
> >> 
> >> If you put that disk in 20 machines that's €2.000. For that money
> >> you can even buy a additional chassis.
> >> 
> >> No, I would run on a single OS disk. It fails? Let it fail. Re-install
> >> and you're good again.
> >> 
> >> Ceph makes sure the data is safe.
> >> 
> > 
> > Wido,
> > 
> > can you elaborate a little bit more on this? How does CEPH achieve that? Is 
> > it by redundant MONs?
> > 
> 
> No, Ceph replicates over hosts by default. So you can loose a host and the 
> other ones will have copies.
> 
> 
> > To my understanding the OSD mapping is needed to have the cluster back. In 
> > our setup (I assume in others as well) that is stored in the OS 
> > disk.Furthermore, our MONs are running on the same host as OSDs. So if the 
> > OS disk fails not only we loose the OSD host but we also loose the MON 
> > node. Is there another way to be protected by such a failure besides 
> > additional MONs?
> > 
> 
> Aha, MON on the OSD host. I never recommend that. Try to use dedicated 
> machines with a good SSD for MONs.
> 
> Technically you can run the MON on the OSD nodes, but I always try to avoid 
> it. It just isn't practical when stuff really goes wrong.
> 
> Wido
> 
> > We recently had 

Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-13 Thread w...@42on.com


> Op 13 aug. 2016 om 08:58 heeft Georgios Dimitrakakis  
> het volgende geschreven:
> 
> 
>>> Op 13 aug. 2016 om 03:19 heeft Bill Sharer  het volgende geschreven:
>>> 
>>> If all the system disk does is handle the o/s (ie osd journals are
>>> on dedicated or osd drives as well), no problem. Just rebuild the
>>> system and copy the ceph.conf back in when you re-install ceph.Â
>>> Keep a spare copy of your original fstab to keep your osd filesystem
>>> mounts straight.
>> 
>> With systems deployed with ceph-disk/ceph-deploy you no longer need a
>> fstab. Udev handles it.
>> 
>>> Just keep in mind that you are down 11 osds while that system drive
>>> gets rebuilt though. It's safer to do 10 osds and then have a
>>> mirror set for the system disk.
>> 
>> In the years that I run Ceph I rarely see OS disks fail. Why bother?
>> Ceph is designed for failure.
>> 
>> I would not sacrifice a OSD slot for a OS disk. Also, let's say a
>> additional OS disk is €100.
>> 
>> If you put that disk in 20 machines that's €2.000. For that money
>> you can even buy a additional chassis.
>> 
>> No, I would run on a single OS disk. It fails? Let it fail. Re-install
>> and you're good again.
>> 
>> Ceph makes sure the data is safe.
>> 
> 
> Wido,
> 
> can you elaborate a little bit more on this? How does CEPH achieve that? Is 
> it by redundant MONs?
> 

No, Ceph replicates over hosts by default. So you can loose a host and the 
other ones will have copies.


> To my understanding the OSD mapping is needed to have the cluster back. In 
> our setup (I assume in others as well) that is stored in the OS 
> disk.Furthermore, our MONs are running on the same host as OSDs. So if the OS 
> disk fails not only we loose the OSD host but we also loose the MON node. Is 
> there another way to be protected by such a failure besides additional MONs?
> 

Aha, MON on the OSD host. I never recommend that. Try to use dedicated machines 
with a good SSD for MONs.

Technically you can run the MON on the OSD nodes, but I always try to avoid it. 
It just isn't practical when stuff really goes wrong.

Wido

> We recently had a problem where a user accidentally deleted a volume. Of 
> course this has nothing to do with OS disk failure itself but we 've been in 
> the loop to start looking for other possible failures on our system that 
> could jeopardize data and this thread got my attention.
> 
> 
> Warmest regards,
> 
> George
> 
> 
>> Wido
>> 
>> Bill Sharer
>> 
>>> On 08/12/2016 03:33 PM, Ronny Aasen wrote:
>>> 
 On 12.08.2016 13:41, Félix Barbeira wrote:
 
 Hi,
 
 I'm planning to make a ceph cluster but I have a serious doubt. At
 this moment we have ~10 servers DELL R730xd with 12x4TB SATA
 disks. The official ceph docs says:
 
 "We recommend using a dedicated drive for the operating system and
 software, and one drive for each Ceph OSD Daemon you run on the
 host."
 
 I could use for example 1 disk for the OS and 11 for OSD data. In
 the operating system I would run 11 daemons to control the OSDs.
 But...what happen to the cluster if the disk with the OS fails??
 maybe the cluster thinks that 11 OSD failed and try to replicate
 all that data over the cluster...that sounds no good.
 
 Should I use 2 disks for the OS making a RAID1? in this case I'm
 "wasting" 8TB only for ~10GB that the OS needs.
 
 In all the docs that i've been reading says ceph has no unique
 single point of failure, so I think that this scenario must have a
 optimal solution, maybe somebody could help me.
 
 Thanks in advance.
 
 --
 
 Félix Barbeira.
>>> if you do not have dedicated slots on the back for OS disks, then i
>>> would recomend using SATADOM flash modules directly into a SATA port
>>> internal in the machine. Saves you 2 slots for osd's and they are
>>> quite reliable. you could even use 2 sd cards if your machine have
>>> the internal SD slot
>>> 
>>> 
>> http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf
>>> [1]
>>> 
>>> kind regards
>>> Ronny Aasen
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com [2]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-u
>> ph.com
>> http://li
>> 
>>> i/ceph-users-ceph.com
>> 
>> 
>> Links:
>> --
>> [1]
>> http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf
>> [2] mailto:ceph-users@lists.ceph.com
>> [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> [4] mailto:bsha...@sharerland.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list

Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-13 Thread Georgios Dimitrakakis



Op 13 aug. 2016 om 03:19 heeft Bill Sharer  het volgende geschreven:


If all the system disk does is handle the o/s (ie osd journals are
on dedicated or osd drives as well), no problem. Just rebuild the
system and copy the ceph.conf back in when you re-install ceph.Â
Keep a spare copy of your original fstab to keep your osd filesystem
mounts straight.


With systems deployed with ceph-disk/ceph-deploy you no longer need a
fstab. Udev handles it.


Just keep in mind that you are down 11 osds while that system drive
gets rebuilt though. It's safer to do 10 osds and then have a
mirror set for the system disk.


In the years that I run Ceph I rarely see OS disks fail. Why bother?
Ceph is designed for failure.

I would not sacrifice a OSD slot for a OS disk. Also, let's say a
additional OS disk is €100.

If you put that disk in 20 machines that's €2.000. For that money
you can even buy a additional chassis.

No, I would run on a single OS disk. It fails? Let it fail. 
Re-install

and you're good again.

Ceph makes sure the data is safe.



Wido,

can you elaborate a little bit more on this? How does CEPH achieve 
that? Is it by redundant MONs?


To my understanding the OSD mapping is needed to have the cluster back. 
In our setup (I assume in others as well) that is stored in the OS 
disk.Furthermore, our MONs are running on the same host as OSDs. So if 
the OS disk fails not only we loose the OSD host but we also loose the 
MON node. Is there another way to be protected by such a failure besides 
additional MONs?


We recently had a problem where a user accidentally deleted a volume. 
Of course this has nothing to do with OS disk failure itself but we 've 
been in the loop to start looking for other possible failures on our 
system that could jeopardize data and this thread got my attention.



Warmest regards,

George



Wido

 Bill Sharer

 On 08/12/2016 03:33 PM, Ronny Aasen wrote:


On 12.08.2016 13:41, Félix Barbeira wrote:


Hi,

I'm planning to make a ceph cluster but I have a serious doubt. At
this moment we have ~10 servers DELL R730xd with 12x4TB SATA
disks. The official ceph docs says:

"We recommend using a dedicated drive for the operating system and
software, and one drive for each Ceph OSD Daemon you run on the
host."

I could use for example 1 disk for the OS and 11 for OSD data. In
the operating system I would run 11 daemons to control the OSDs.
But...what happen to the cluster if the disk with the OS fails??
maybe the cluster thinks that 11 OSD failed and try to replicate
all that data over the cluster...that sounds no good.

Should I use 2 disks for the OS making a RAID1? in this case I'm
"wasting" 8TB only for ~10GB that the OS needs.

In all the docs that i've been reading says ceph has no unique
single point of failure, so I think that this scenario must have a
optimal solution, maybe somebody could help me.

Thanks in advance.

--

Félix Barbeira.

if you do not have dedicated slots on the back for OS disks, then i
would recomend using SATADOM flash modules directly into a SATA port
internal in the machine. Saves you 2 slots for osd's and they are
quite reliable. you could even use 2 sd cards if your machine have
the internal SD slot




http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf

[1]

kind regards
Ronny Aasen

___
ceph-users mailing list
ceph-users@lists.ceph.com [2]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]

___
ceph-users mailing list
ceph-u

ph.com
http://li


i/ceph-users-ceph.com



Links:
--
[1]

http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf
[2] mailto:ceph-users@lists.ceph.com
[3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[4] mailto:bsha...@sharerland.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-13 Thread w...@42on.com


> Op 13 aug. 2016 om 03:19 heeft Bill Sharer  het 
> volgende geschreven:
> 
> If all the system disk does is handle the o/s (ie osd journals are on 
> dedicated or osd drives as well), no problem.  Just rebuild the system and 
> copy the ceph.conf back in when you re-install ceph.  Keep a spare copy of 
> your original fstab to keep your osd filesystem mounts straight.
> 

With systems deployed with ceph-disk/ceph-deploy you no longer need a fstab. 
Udev handles it.

> Just keep in mind that you are down 11 osds while that system drive gets 
> rebuilt though.  It's safer to do 10 osds and then have a mirror set for the 
> system disk.
> 

In the years that I run Ceph I rarely see OS disks fail. Why bother? Ceph is 
designed for failure.

I would not sacrifice a OSD slot for a OS disk. Also, let's say a additional OS 
disk is €100.

If you put that disk in 20 machines that's €2.000. For that money you can even 
buy a additional chassis.

No, I would run on a single OS disk. It fails? Let it fail. Re-install and 
you're good again.

Ceph makes sure the data is safe.

Wido

> Bill Sharer
> 
> 
>> On 08/12/2016 03:33 PM, Ronny Aasen wrote:
>>> On 12.08.2016 13:41, Félix Barbeira wrote:
>>> Hi,
>>> 
>>> I'm planning to make a ceph cluster but I have a serious doubt. At this 
>>> moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The official 
>>> ceph docs says:
>>> 
>>> "We recommend using a dedicated drive for the operating system and 
>>> software, and one drive for each Ceph OSD Daemon you run on the host."
>>> 
>>> I could use for example 1 disk for the OS and 11 for OSD data. In the 
>>> operating system I would run 11 daemons to control the OSDs. But...what 
>>> happen to the cluster if the disk with the OS fails?? maybe the cluster 
>>> thinks that 11 OSD failed and try to replicate all that data over the 
>>> cluster...that sounds no good.
>>> 
>>> Should I use 2 disks for the OS making a RAID1? in this case I'm "wasting" 
>>> 8TB only for ~10GB that the OS needs.
>>> 
>>> In all the docs that i've been reading says ceph has no unique single point 
>>> of failure, so I think that this scenario must have a optimal solution, 
>>> maybe somebody could help me.
>>> 
>>> Thanks in advance.
>>> 
>>> -- 
>>> Félix Barbeira.
>>> 
>> if you do not have dedicated slots on the back for OS disks, then i would 
>> recomend using SATADOM flash modules directly into a SATA port internal in 
>> the machine. Saves you 2 slots for osd's and they are quite reliable. you 
>> could even use 2 sd cards if your machine have the internal SD slot 
>> 
>> http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf
>> 
>> kind regards
>> Ronny Aasen
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-12 Thread Bill Sharer
If all the system disk does is handle the o/s (ie osd journals are on 
dedicated or osd drives as well), no problem.  Just rebuild the system 
and copy the ceph.conf back in when you re-install ceph.  Keep a spare 
copy of your original fstab to keep your osd filesystem mounts straight.


Just keep in mind that you are down 11 osds while that system drive gets 
rebuilt though.  It's safer to do 10 osds and then have a mirror set for 
the system disk.


Bill Sharer


On 08/12/2016 03:33 PM, Ronny Aasen wrote:

On 12.08.2016 13:41, Félix Barbeira wrote:

Hi,

I'm planning to make a ceph cluster but I have a serious doubt. At 
this moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. 
The official ceph docs says:


"We recommend using a dedicated drive for the operating system and 
software, and one drive for each Ceph OSD Daemon you run on the host."


I could use for example 1 disk for the OS and 11 for OSD data. In the 
operating system I would run 11 daemons to control the OSDs. 
But...what happen to the cluster if the disk with the OS fails?? 
maybe the cluster thinks that 11 OSD failed and try to replicate all 
that data over the cluster...that sounds no good.


Should I use 2 disks for the OS making a RAID1? in this case I'm 
"wasting" 8TB only for ~10GB that the OS needs.


In all the docs that i've been reading says ceph has no unique single 
point of failure, so I think that this scenario must have a optimal 
solution, maybe somebody could help me.


Thanks in advance.

--
Félix Barbeira.

if you do not have dedicated slots on the back for OS disks, then i 
would recomend using SATADOM flash modules directly into a SATA port 
internal in the machine. Saves you 2 slots for osd's and they are 
quite reliable. you could even use 2 sd cards if your machine have the 
internal SD slot


http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf

kind regards
Ronny Aasen


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-12 Thread Ronny Aasen

On 12.08.2016 13:41, Félix Barbeira wrote:

Hi,

I'm planning to make a ceph cluster but I have a serious doubt. At 
this moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. 
The official ceph docs says:


"We recommend using a dedicated drive for the operating system and 
software, and one drive for each Ceph OSD Daemon you run on the host."


I could use for example 1 disk for the OS and 11 for OSD data. In the 
operating system I would run 11 daemons to control the OSDs. 
But...what happen to the cluster if the disk with the OS fails?? maybe 
the cluster thinks that 11 OSD failed and try to replicate all that 
data over the cluster...that sounds no good.


Should I use 2 disks for the OS making a RAID1? in this case I'm 
"wasting" 8TB only for ~10GB that the OS needs.


In all the docs that i've been reading says ceph has no unique single 
point of failure, so I think that this scenario must have a optimal 
solution, maybe somebody could help me.


Thanks in advance.

--
Félix Barbeira.

if you do not have dedicated slots on the back for OS disks, then i 
would recomend using SATADOM flash modules directly into a SATA port 
internal in the machine. Saves you 2 slots for osd's and they are quite 
reliable. you could even use 2 sd cards if your machine have the 
internal SD slot


http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf

kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-12 Thread David Turner
Nothing actually happens to your osds if your OS drive fails.  To prevent the 
unnecessary backfilling off of the server with the dead OS drive, you would set 
NOOUT in the cluster, reinstall the OS on a good drive, install ceph on it, and 
then restart the server.  The OSDs have all of the information they need to 
bring themselves back up and into the cluster.  Once they are back up, you 
unset noout and are good to go.

If the drives had already been marked out of the cluster, then set noout and 
manually mark them in via `ceph osd in #` and proceed as above.  It is a very 
simple process to replace the OS drive of a storage node.



[cid:image036f36.JPG@bb80dafc.4f82c6ec]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Cybertinus 
[c...@cybertinus.nl]
Sent: Friday, August 12, 2016 7:31 AM
To: Félix Barbeira
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] what happen to the OSDs if the OS disk dies?

Hello Felix,

When you put your OS on a single drive and that drive fails, you will
loose all the OSDs on that machine, because the entier machine goes
down. The PGs that now miss a partner are going to be replicated again.
So, in your case, the PGs that are on those 11 OSDs.
This rebuilding doesn't start right away, so you can safely reboot an
OSD host without starting a major rebalance of your data.

I would put 2 drives in RAID1 if I were you. Putting 2 SSDs in the back
2,5" slots, like suggested by Brian, sounds like the best option to me.
This way you don't loose a massive storage amount (2x10x8 = 160 TB you
would loose otherwise, just for the OS installation...)

---
Kind regards,
Cybertinus

On 12-08-2016 13:41, Félix Barbeira wrote:

> Hi,
>
> I'm planning to make a ceph cluster but I have a serious doubt. At this
> moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The
> official ceph docs says:
>
> "We recommend using a dedicated drive for the operating system and
> software, and one drive for each Ceph OSD Daemon you run on the host."
>
> I could use for example 1 disk for the OS and 11 for OSD data. In the
> operating system I would run 11 daemons to control the OSDs. But...what
> happen to the cluster if the disk with the OS fails?? maybe the cluster
> thinks that 11 OSD failed and try to replicate all that data over the
> cluster...that sounds no good.
>
> Should I use 2 disks for the OS making a RAID1? in this case I'm
> "wasting" 8TB only for ~10GB that the OS needs.
>
> In all the docs that i've been reading says ceph has no unique single
> point of failure, so I think that this scenario must have a optimal
> solution, maybe somebody could help me.
>
> Thanks in advance.
> --
> Félix Barbeira.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-12 Thread Cybertinus

Hello Felix,

When you put your OS on a single drive and that drive fails, you will 
loose all the OSDs on that machine, because the entier machine goes 
down. The PGs that now miss a partner are going to be replicated again. 
So, in your case, the PGs that are on those 11 OSDs.
This rebuilding doesn't start right away, so you can safely reboot an 
OSD host without starting a major rebalance of your data.


I would put 2 drives in RAID1 if I were you. Putting 2 SSDs in the back 
2,5" slots, like suggested by Brian, sounds like the best option to me. 
This way you don't loose a massive storage amount (2x10x8 = 160 TB you 
would loose otherwise, just for the OS installation...)


---
Kind regards,
Cybertinus

On 12-08-2016 13:41, Félix Barbeira wrote:


Hi,

I'm planning to make a ceph cluster but I have a serious doubt. At this 
moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The 
official ceph docs says:


"We recommend using a dedicated drive for the operating system and 
software, and one drive for each Ceph OSD Daemon you run on the host."


I could use for example 1 disk for the OS and 11 for OSD data. In the 
operating system I would run 11 daemons to control the OSDs. But...what 
happen to the cluster if the disk with the OS fails?? maybe the cluster 
thinks that 11 OSD failed and try to replicate all that data over the 
cluster...that sounds no good.


Should I use 2 disks for the OS making a RAID1? in this case I'm 
"wasting" 8TB only for ~10GB that the OS needs.


In all the docs that i've been reading says ceph has no unique single 
point of failure, so I think that this scenario must have a optimal 
solution, maybe somebody could help me.


Thanks in advance.
--
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-12 Thread RDS
Mirror the OS disks, use 10 disks for 10 OSD's
> On Aug 12, 2016, at 7:41 AM, Félix Barbeira  wrote:
> 
> Hi,
> 
> I'm planning to make a ceph cluster but I have a serious doubt. At this 
> moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The official 
> ceph docs says:
> 
> "We recommend using a dedicated drive for the operating system and software, 
> and one drive for each Ceph OSD Daemon you run on the host."
> 
> I could use for example 1 disk for the OS and 11 for OSD data. In the 
> operating system I would run 11 daemons to control the OSDs. But...what 
> happen to the cluster if the disk with the OS fails?? maybe the cluster 
> thinks that 11 OSD failed and try to replicate all that data over the 
> cluster...that sounds no good.
> 
> Should I use 2 disks for the OS making a RAID1? in this case I'm "wasting" 
> 8TB only for ~10GB that the OS needs.
> 
> In all the docs that i've been reading says ceph has no unique single point 
> of failure, so I think that this scenario must have a optimal solution, maybe 
> somebody could help me.
> 
> Thanks in advance.
> 
> -- 
> Félix Barbeira.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Rick Stehno


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-12 Thread Brian ::
Hi Felix

If you have R730XD then you should have 2 x 2.5" slots on the back.
You can stick in SSDs in RAID1 for your OS here.



On Fri, Aug 12, 2016 at 12:41 PM, Félix Barbeira  wrote:
> Hi,
>
> I'm planning to make a ceph cluster but I have a serious doubt. At this
> moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The official
> ceph docs says:
>
> "We recommend using a dedicated drive for the operating system and software,
> and one drive for each Ceph OSD Daemon you run on the host."
>
> I could use for example 1 disk for the OS and 11 for OSD data. In the
> operating system I would run 11 daemons to control the OSDs. But...what
> happen to the cluster if the disk with the OS fails?? maybe the cluster
> thinks that 11 OSD failed and try to replicate all that data over the
> cluster...that sounds no good.
>
> Should I use 2 disks for the OS making a RAID1? in this case I'm "wasting"
> 8TB only for ~10GB that the OS needs.
>
> In all the docs that i've been reading says ceph has no unique single point
> of failure, so I think that this scenario must have a optimal solution,
> maybe somebody could help me.
>
> Thanks in advance.
>
> --
> Félix Barbeira.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com