Re: OSD replacement feature

2015-11-23 Thread David Zafman


That is correct.  The goal is to only refill the replacement OSD disk.  
Otherwise, if the OSD is only down for less than 
mon_osd_down_out_interval (5 min default) or noout is set, no other data 
movement would occur.


David

On 11/23/15 8:45 PM, Wei-Chung Cheng wrote:

2015-11-21 1:54 GMT+08:00 David Zafman :

There are two reasons for having a ceph-disk replace feature.

1. To simplify the steps required to replace a disk
2. To allow a disk to be replaced proactively without causing any data
movement.

Hi David,

It good to without causing any data movement when we want to replaced
failure osd.

But I don't have any idea to complete it, could you give some opinions?

I though if we want to replace failure we must move the object data on
failure osd to new(replacement) osd?

Or I got some misunderstanding?

thanks!!!
vicente


So keeping the osd id the same is required and is what motivated the feature
for me.

David


On 11/20/15 3:38 AM, Sage Weil wrote:

On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:

Hi Loic and cephers,

Sure, I have time to help (comment) on this feature replace a disk.
This is a useful feature to handle disk failure :p

An simple step is described on http://tracker.ceph.com/issues/13732 :
1. set noout flag - if the broken osd is primary osd, could we handle
well?
2. stop osd daemon and we need to wait the osd actually down. (or
maybe use deactivate option with ceph-disk)

these two above step seems OK.
about handle crush map, should we remove the broken osd out?
If we do that, why we set noout flag? It still trigger re-balance
after we remove osd from crushmap.

Right--I think you generally want to do either one or the other:

1) mark osd out, leave failed disk in place.  or, replace with new disk
that re-uses the same osd id.

or,

2) remove osd from crush map.  replace with new disk (which gets new osd
id).

I think re-using the osd id is awkward currently, so doing 1 and replacing
the disk ends up moving data twice.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD replacement feature

2015-11-20 Thread Sage Weil
On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:
> Hi Loic and cephers,
> 
> Sure, I have time to help (comment) on this feature replace a disk.
> This is a useful feature to handle disk failure :p
> 
> An simple step is described on http://tracker.ceph.com/issues/13732 :
> 1. set noout flag - if the broken osd is primary osd, could we handle well?
> 2. stop osd daemon and we need to wait the osd actually down. (or
> maybe use deactivate option with ceph-disk)
> 
> these two above step seems OK.
> about handle crush map, should we remove the broken osd out?
> If we do that, why we set noout flag? It still trigger re-balance
> after we remove osd from crushmap.

Right--I think you generally want to do either one or the other:

1) mark osd out, leave failed disk in place.  or, replace with new disk 
that re-uses the same osd id.

or,

2) remove osd from crush map.  replace with new disk (which gets new osd 
id).

I think re-using the osd id is awkward currently, so doing 1 and replacing 
the disk ends up moving data twice.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD replacement feature

2015-11-20 Thread Wei-Chung Cheng
2015-11-20 19:38 GMT+08:00 Sage Weil :
> On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:
>> Hi Loic and cephers,
>>
>> Sure, I have time to help (comment) on this feature replace a disk.
>> This is a useful feature to handle disk failure :p
>>
>> An simple step is described on http://tracker.ceph.com/issues/13732 :
>> 1. set noout flag - if the broken osd is primary osd, could we handle well?
>> 2. stop osd daemon and we need to wait the osd actually down. (or
>> maybe use deactivate option with ceph-disk)
>>
>> these two above step seems OK.
>> about handle crush map, should we remove the broken osd out?
>> If we do that, why we set noout flag? It still trigger re-balance
>> after we remove osd from crushmap.
>
> Right--I think you generally want to do either one or the other:
>
> 1) mark osd out, leave failed disk in place.  or, replace with new disk
> that re-uses the same osd id.
>
> or,
>
> 2) remove osd from crush map.  replace with new disk (which gets new osd
> id).
>
> I think re-using the osd id is awkward currently, so doing 1 and replacing
> the disk ends up moving data twice.
>

Hi sage,

If the osd on "DNE" status, its weight must be zero and trigger moving
object data?

In my test cases, I only remove the auth key and osd-id (osd is "DNE" status).
Then replace with new disk that re-uses the same osd-id.

The osd only has little time on the out status.
I think this operation could reduce some redundant data moving.
How you think this operation? or just like you say.
mark osd out (deactivate/destroy ...etc) and replace with new disk
that re-uses the same osd id?

btw, if we just use ceph-deploy/ceph-disk, we could not create osd
with specific osd-id.
that should we implement for it?

thanks!!!
vicente
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD replacement feature

2015-11-20 Thread David Zafman


There are two reasons for having a ceph-disk replace feature.

1. To simplify the steps required to replace a disk
2. To allow a disk to be replaced proactively without causing any data 
movement.


So keeping the osd id the same is required and is what motivated the 
feature for me.


David

On 11/20/15 3:38 AM, Sage Weil wrote:

On Fri, 20 Nov 2015, Wei-Chung Cheng wrote:

Hi Loic and cephers,

Sure, I have time to help (comment) on this feature replace a disk.
This is a useful feature to handle disk failure :p

An simple step is described on http://tracker.ceph.com/issues/13732 :
1. set noout flag - if the broken osd is primary osd, could we handle well?
2. stop osd daemon and we need to wait the osd actually down. (or
maybe use deactivate option with ceph-disk)

these two above step seems OK.
about handle crush map, should we remove the broken osd out?
If we do that, why we set noout flag? It still trigger re-balance
after we remove osd from crushmap.

Right--I think you generally want to do either one or the other:

1) mark osd out, leave failed disk in place.  or, replace with new disk
that re-uses the same osd id.

or,

2) remove osd from crush map.  replace with new disk (which gets new osd
id).

I think re-using the osd id is awkward currently, so doing 1 and replacing
the disk ends up moving data twice.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html