Re: [ceph-users] Give up on backfill, remove slow OSD

2016-10-08 Thread Iain Buclaw
On 3 October 2016 at 07:30, Ronny Aasen  wrote:
> On 22. sep. 2016 09:16, Iain Buclaw wrote:
>>
>> Hi,
>>
>> I currently have an OSD that has been backfilling data off it for a
>> little over two days now, and it's gone from approximately 68 PGs to
>> 63.
>>
>> As data is still being read from, and written to it by clients whilst
>> I'm trying to get it out of the cluster, this is not helping it at
>> all.  I figured that it's probably best just to cut my losses and just
>> force it out entirely so that all new writes and reads to those PGs
>> get redirected elsewhere to a functional disk, and the rest of the
>> recovery can proceed without being blocked heavily by this one disk.
>>
>> Granted that objects and files have a 1:1 relationship, I can just
>> rsync the data to a new server and write it back into ceph afterwards.
>>
>> Now, I know that as soon as I bring down this OSD, the entire cluster
>> will stop operating.  So what's the most swift method of telling the
>> cluster to forget about this disk and everything that may be stored on
>> it.
>>
>> Thanks
>>
>
>
> It should normally not get new writes to it if you want to remove it from
> the cluster. I assume you did something wrong here. How did you define the
> osd out of the cluster ?
>
>
> generally my procedure for a working osd is something like
> 1. ceph osd crush reweight osd.X 0
>
> 2. ceph osd tree
>check that the osd in question actualy have 0 weight (first number
> after ID) and that the host weight have been reduced accordingly.
>

This was what was done.  However it seems to take a very long time for
ceph to backfill millions of tiny objects, the slow/bad SATA disk only
exacerbated the situation.

>
> 3. ls /var/lib/ceph/osd/cph-X/current ; periodically
>wait for the osd to drain, there should be no PG directories n.xxx_head
> or n.xxx_TEMP this will take a while depending on the size of the osd. in
> reality i just wait  until the disk usage graph settle, then doublecheck
> with ls.
>

With some of the OSDs, there were some PGs still left - probably
orphaned somehow in the confusion when rebalancing away from full
disks.  Is not a problem for me though, as I just scanned the
directories and rewrote the file back into ceph.  It's rather nice to
see that they all got written into the same PG that I recovered them
from.  So ceph is predictable in where it writes data, I wonder if I
could use that to my advantage somehow. :-)


> 4: once empty I mark the osd out, stop the process, and removes the osd from
> the cluster as written in the documentation
>  - ceph auth del osd.x
>  - ceph osd crush remove osd.x
>  - ceph osd rm osd.x
>

This is how to remove an OSD, not how to remove a and recreate a PG. ;-)

>
>
> PS: if your cluster stops to operate when a osd goes down, you have
> something else fundamentally wrong. you should look into this as well as a
> separate case.
>

osd pool default size = 1

I'm still trying to work out the best method of handlling this, as I
understand it, if an OSD goes down, all requests to it get stuck in a
queue, and that slows down operation latency to functional OSDs.

In any case, it eventually finished backfilling just over a week
later, and I managed to speed up the backfilling of the SSD disks by
starting a balance on the btrfs disk metadata, that freed up around
1.5 TB of data back to ceph.

Being blocked by backfill+too_full probably didn't help overall
recovery either, as it tried to juggle going from 30 full disks, to
adding 15 temporary disks, then adding a further 8 when proper servers
were made available to handle the overflow, removing the 15
temporaries.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Give up on backfill, remove slow OSD

2016-10-02 Thread Ronny Aasen

On 22. sep. 2016 09:16, Iain Buclaw wrote:

Hi,

I currently have an OSD that has been backfilling data off it for a
little over two days now, and it's gone from approximately 68 PGs to
63.

As data is still being read from, and written to it by clients whilst
I'm trying to get it out of the cluster, this is not helping it at
all.  I figured that it's probably best just to cut my losses and just
force it out entirely so that all new writes and reads to those PGs
get redirected elsewhere to a functional disk, and the rest of the
recovery can proceed without being blocked heavily by this one disk.

Granted that objects and files have a 1:1 relationship, I can just
rsync the data to a new server and write it back into ceph afterwards.

Now, I know that as soon as I bring down this OSD, the entire cluster
will stop operating.  So what's the most swift method of telling the
cluster to forget about this disk and everything that may be stored on
it.

Thanks




It should normally not get new writes to it if you want to remove it 
from the cluster. I assume you did something wrong here. How did you 
define the osd out of the cluster ?



generally my procedure for a working osd is something like
1. ceph osd crush reweight osd.X 0

2. ceph osd tree
   check that the osd in question actualy have 0 weight (first number
after ID) and that the host weight have been reduced accordingly.


3. ls /var/lib/ceph/osd/cph-X/current ; periodically
   wait for the osd to drain, there should be no PG directories 
n.xxx_head or n.xxx_TEMP this will take a while depending on the size of 
the osd. in reality i just wait  until the disk usage graph settle, then 
doublecheck with ls.


4: once empty I mark the osd out, stop the process, and removes the osd 
from the cluster as written in the documentation

 - ceph auth del osd.x
 - ceph osd crush remove osd.x
 - ceph osd rm osd.x



PS: if your cluster stops to operate when a osd goes down, you have 
something else fundamentally wrong. you should look into this as well as 
a separate case.


kind regards
Ronny Aasen





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Give up on backfill, remove slow OSD

2016-09-22 Thread Iain Buclaw
Hi,

I currently have an OSD that has been backfilling data off it for a
little over two days now, and it's gone from approximately 68 PGs to
63.

As data is still being read from, and written to it by clients whilst
I'm trying to get it out of the cluster, this is not helping it at
all.  I figured that it's probably best just to cut my losses and just
force it out entirely so that all new writes and reads to those PGs
get redirected elsewhere to a functional disk, and the rest of the
recovery can proceed without being blocked heavily by this one disk.

Granted that objects and files have a 1:1 relationship, I can just
rsync the data to a new server and write it back into ceph afterwards.

Now, I know that as soon as I bring down this OSD, the entire cluster
will stop operating.  So what's the most swift method of telling the
cluster to forget about this disk and everything that may be stored on
it.

Thanks

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com